Yelp — Journey to Zero Trust Access¶
Summary¶
Yelp Engineering post (2025-04-15) by Corporate Systems + Client Platform Engineering — the first-party narrative of why Yelp retired Ivanti Pulse Secure as its employee VPN in late 2023 and replaced it with Netbird, an open-source, WireGuard-based Zero Trust Access platform. The post is a requirements-and-selection retrospective — it announces the platform choice and commits to a follow-up post on implementation + initial architecture. Load- bearing framing: Yelp is "a fully remote company" with an "increasingly distributed" employee base, so "secure access to resources from anywhere" is a critical business function; the prior Pulse substrate had "peak download speed in the low tens of megabits per second for most users" and "cumbersome browser- to-VPN client handoff for session authentication" after a SAML-via-Ivanti transition. The strategic bet is Zero Trust Architecture is the future and "aligned with our long term goals of reducing VPN utilization and creating more fine grained access control structures in the future, as opposed to broad, binary policies on huge subnets and network segments." Yelp's MTLS-based Edge Gateway is the parallel migration path for less-sensitive web applications, but "this was not an immediate solution for all employees"; Netbird fills the gap for the engineering-oriented use cases (SSH to devboxes, downloading files from internal servers) that can't easily move off full-network access. Five evaluation pillars drove the selection: (1) Okta/OIDC support — Yelp's prior LDAP then SAML-on-Ivanti authentication flows had UX friction; Netbird's OIDC integration with Okta "empowers us to enforce policies that ensure only users on managed devices with a secure security posture are granted access" — the canonical OIDC+device-posture access gate shape. (2) Simple and intuitive user interface — Yelp "tailored the client experience for our specific needs, removing some of the more advanced options from the UI", added self-repair + helpdesk shortcuts, personalized icons, and per-stage connection feedback. (3) Open source and extensible — two distinct levers: "if critical security issues ever arose, we would not be beholden to the maintainers alone — we ourselves could provide fixes if need be" (canonical patterns/open-source-for-security-response-agency framing) plus "Yelp has the opportunity to contribute back to the community" — already realised: "multiple changes have been pushed upstream to Netbird's main branch from Yelpers working to solve issues we encountered, debugged, and ultimately solved" — another instance of patterns/upstream-contribution-parallel-to-in-house-integration. (4) High throughput and low latency — "Netbird being plugged into a 10 gigabit backbone and supported by the blazing fast cryptography of Wireguard" let "users ... achieve speeds upwards of 1 gigabit per second — mostly restricted by their home internet speed limits", vs Pulse's "low tens of megabits per second for most users"; latency "close to the pure cost of the wire users traversed with single digit milliseconds of overhead added by wrapping packets in the Wire Guard protocol." (5) Fault tolerance — delivered by WireGuard's mesh topology plus Netbird's router peer abstraction: "All members of the mesh are peers but router peers serve the special role of being able to accept and egress traffic from other peers. Clients intrinsically have a one to many relationship with router peers they are permitted to use." Measured failover: "a router peer that was actively handling traffic for a given peer could suddenly halt operation, and the client would experience a sub 2 second connectivity interruption while their traffic was rerouted to another host" — a live failover target that Pulse's single-tunnel-per-session model couldn't meet. Post closes by promising a follow-up on "the implementation, challenges, and initial architecture of Yelp's deployment of Netbird."
Key takeaways¶
-
ZTA as the successor substrate to corporate VPN. Yelp explicitly positions the Netbird migration as not just a faster VPN replacement but a stepping-stone toward "reducing VPN utilization and creating more fine grained access control structures in the future, as opposed to broad, binary policies on huge subnets and network segments" — the canonical VPN→ZTA transition motion. The end-state is a cohabitation of Netbird (for use cases that still need broad network access) with Yelp's MTLS-based Edge Gateway (for per-application access to less-sensitive web apps). ZTA is "not only becoming an industry trend, but also aligned with our long term goals."
-
Authentication protocol ladder: LDAP → SAML → OIDC. Yelp's auth migration is explicit — LDAP "lacked advanced user and device trust verification"; SAML "implementation within Ivanti's product led to a suboptimal user experience due to a cumbersome browser-to-VPN client handoff for session authentication"; OIDC with Okta is the target because it "empowers us to enforce policies that ensure only users on managed devices with a secure security posture are granted access". Device posture — not just user identity — is the unlock: "In today's environment, it's not enough to simply verify the identity of the user." Canonical OIDC+device-posture access gate instance. (Source: sources/2025-04-15-yelp-journey-to-zero-trust-access)
-
WireGuard mesh topology + router peers as the HA primitive. WireGuard's symmetric peer model is extended by Netbird with a router peer role: "All members of the mesh are peers but router peers serve the special role of being able to accept and egress traffic from other peers. Clients intrinsically have a one to many relationship with router peers they are permitted to use. This allows for maintenance or service interruption on one router peer without causing a user to reconnect to the network or experience noticeable degradation." Measured failover budget: "sub 2 second connectivity interruption while their traffic was rerouted to another host." This is a materially different HA posture from Pulse's single-tunnel model — failover becomes a transparent reroute, not a client-initiated reconnect. Canonical WireGuard mesh topology instance.
-
Open source is an operational-security posture, not just a cost posture. Yelp articulates two distinct open-source levers: (a) response agency — "if critical security issues ever arose, we would not be beholden to the maintainers alone — we ourselves could provide fixes if need be" — canonicalised as the open-source-for-security-response-agency pattern; and (b) upstream contribution — "multiple changes have been pushed upstream to Netbird's main branch from Yelpers working to solve issues we encountered, debugged, and ultimately solved" — an instance of patterns/upstream-contribution-parallel-to-in-house-integration already established on the wiki by Slack's 2026-03-31 HTTP/3-probing post and Redpanda's FIPS post. The two levers stack: response agency justifies an internal fork if ever needed; upstream contribution keeps the fork surface small by returning fixes to the main branch.
-
Throughput as a quality-of-life lever for engineering workflows. Yelp concedes "most users do not demand high throughput and low latency to complete their day to day business functions" but names three concrete engineering use cases where the Pulse ceiling was actively damaging: "downloading large logs at a snail's pace, cloning big Git repos and watching the commits trickle in, or connecting to a terminal or remote desktop and feeling like you are moving in slow motion." Quantified transition: Pulse "peak download speed in the low tens of megabits per second" → Netbird "upwards of 1 gigabit per second" (10-gigabit backbone on the server side, home internet on the client side as the new bottleneck). WireGuard's cryptographic overhead at wire-latency resolution: "single digit milliseconds of overhead added by wrapping packets in the Wire Guard protocol."
-
Tailor-the-client UX for mixed technical ability. Yelp named this as a first-class selection criterion: "simplicity is key when supporting less-technical users." Netbird's open-source client was "approachable, well thought out and open source" — letting Yelp hide advanced UI options, add self-repair + helpdesk ticket shortcuts, customize icons, and emit per-stage connection feedback. Canonical pattern for enterprise security-tool rollouts: the client experience is not the vendor's default but a forked fork adjusted to the specific employee population; open source is what makes it possible.
-
Cohabitation with MTLS Edge Gateway — not replacement. The architecture isn't Netbird-instead-of-everything-else; it's Netbird-for-the-engineering-long-tail while "work was already underway to shift less sensitive applications to alternate access methods like our MTLS based Edge Gateway." The Edge Gateway is the future-state per- application-auth path; Netbird is the present-state broad- network-access path; ZTA is the direction of travel, not an instantaneous substitution. Canonical concepts/vpn-to-zta-migration realisation: multiple parallel access primitives during the transition, with VPN utilization shrinking over time rather than flipping in one cutover.
Operational numbers¶
- Pulse ceiling: "low tens of megabits per second" peak download for most users.
- Netbird ceiling: "upwards of 1 gigabit per second" — limited by home internet, not Netbird.
- Netbird backbone: 10-gigabit server-side.
- WireGuard crypto overhead: "single digit milliseconds" added to wire latency.
- Router peer failover interruption: "sub 2 second connectivity interruption" before traffic reroutes to another router peer host — measured in Yelp's testing, with the halted peer being one that was "actively handling traffic for a given peer".
- Evaluation period: late 2023 (Corporate Systems + Client Platform Engineering started looking for Pulse alternatives).
- Upstream contributions: "multiple changes ... pushed upstream to Netbird's main branch" — count not disclosed.
Caveats¶
- This is a decision-and-requirements post, not an implementation-architecture post. Yelp commits to a follow-up on "the implementation, challenges, and initial architecture of Yelp's deployment of Netbird" — which is where deployment shape (how many router peers, how they're distributed geographically, how clients discover them, how Okta policies are composed, how helpdesk-initiated recovery works) would live. On the current post, that detail is absent.
- Netbird is introduced without architectural internals — router peer is the only named concept from the Netbird data plane. Control-plane shape (how peers learn about each other, how public keys are distributed, whether the Netbird management service is self-hosted or SaaS) is undisclosed.
- Okta integration detail: OIDC named, device-posture claim made, but device-posture signal source (MDM? Endpoint agent? Okta Verify? A custom check?) is not disclosed.
- MTLS Edge Gateway: named as the future-state per- application-auth path but not described; no architecture disclosed.
- No scale numbers: count of Yelp employees, count of router peers, count of upstream PRs merged — all undisclosed.
- No incident disclosure: the article describes Pulse as requiring "a more reliable solution" but does not describe specific production incidents that motivated the change.
- Tier-3 source; Yelp's second distributed-systems wiki ingest after the 2025-02-04 search-query-understanding post — a security-and-infrastructure axis distinct from that post's search-ML axis. Opens Yelp's infra axis on the wiki.
Scope decision¶
Tier-3 on-scope, decisive include. The post is explicitly about corporate-network architecture at Yelp's fully-remote scale, cites three concrete systems (WireGuard, Netbird, Okta) plus a retired predecessor (Ivanti Pulse Secure), discloses five named selection pillars, provides quantitative throughput and failover numbers, and names a novel architectural primitive (router peer) worth canonicalising. The body is not a product-launch announcement: it's a first-party migration retrospective with measured wins and commitments to follow-up architectural detail. Fits the Tier-3 inclusion criteria — "distributed systems internals, scaling trade-offs, infrastructure architecture, production incidents, storage/networking/streaming design" — on infrastructure architecture + networking-design grounds. Architecture density ~75% of body; marketing voice ("Yelpers who work tirelessly to connect people with great local businesses", careers CTA at close) confined to intro and outro paragraphs.
Source¶
- Original: https://engineeringblog.yelp.com/2025/04/journey-to-zero-trust-access.html
- Raw markdown:
raw/yelp/2025-04-15-journey-to-zero-trust-access-2155de13.md
Related¶
- companies/yelp — Yelp company page; this is the second ingest after the 2025-02-04 search-ML post.
- systems/netbird — the chosen ZTA platform.
- systems/wireguard — the data-plane protocol under Netbird.
- systems/okta — the OIDC identity provider.
- systems/pulse-secure — the retired predecessor.
- concepts/zero-trust-authorization — the strategic framing.
- concepts/sso-authentication — the authentication doctrine.
- concepts/wireguard-mesh-topology — the HA primitive.
- concepts/router-peer — the Netbird abstraction for egress peers in the mesh.
- concepts/vpn-to-zta-migration — the transition motion.
- patterns/oidc-plus-device-posture-access-gate — the auth pattern.
- patterns/open-source-for-security-response-agency — the OSS lever Yelp invoked as a selection criterion.
- patterns/upstream-contribution-parallel-to-in-house-integration — Yelp's multi-PR contribution loop against Netbird's main branch.