Skip to content

Lyft

Lyft Engineering blog is a Tier-2 source on the sysdesign-wiki. Lyft runs a large Envoy-fronted ridesharing platform (Envoy itself originated at Lyft), heavy polyglot backend (Python + Go + Java) with iOS + Android mobile clients, and the usual infra surface — service meshes, feature flags, rider/driver matching, trip lifecycle state. The blog's historical strength has been systems-at-scale content (Envoy, Flyte, service-mesh operations) plus mobile networking and protocol design for mobile-to-server + server-to-server communication. The 2020 post by Michael Rebello on Lyft's mobile networking journey documents the company-wide adoption of protobuf for mobile traffic; the 2024-09-16 Lyft Media post that's first ingested here is a direct descendant of that adoption.

The 2025-11-18 and 2026-01-06 posts opened a second Lyft theme on the wiki: ML platform + data infra. LyftLearn 2.0 is the ML platform (split compute from serving — SageMaker for training, Kubernetes/EKS for inference). The Lyft Feature Store is the data substrate underneath both — a "platform of platforms" with three ingestion lanes (batch / streaming / direct-CRUD) converging on dsfeatures, a wrapper over DynamoDB + ValKey + OpenSearch.

The 2026-04-23 post opened a third Lyft theme: Mapping + pickup UX. Lyft's Mapping team covers map data, pickup-spot recommendations, routing, and the rider/driver app surfaces — four pieces that the "Smarter Pickup Experience for Gated Communities" project weaves together into a single named playbook for encoding real-world physical constraints into the map. Gated communities make up 25–30% of rides in selected markets; the fix generalises explicitly to road closures, unsafe curbs, parades, and marathons — same four-step pattern applied to a different spatial constraint.

Key systems

Metric governance (MSL, 2026-06-10 post)

  • systems/lyft-metric-semantic-layer — Metric Semantic Layer; a centralized, versioned Python package serving as the single authoritative repository for every "Golden Metric" definition. YAML configs + Jinja-templated SQL + Python API + MCP server. Integrated with Amundsen for discoverability and a self-service Metric UI for no-code SQL generation.

Mapping / pickup UX (gated-community pickup, 2026-04-23 post)

  • systems/lyft-gate-area-generator — Map Data team algorithm that generates gate-area polygons for gated communities from OpenStreetMap + driver feedback. Handles single-entrance apartment complexes through multi-gate developments with internal road networks. Feeds the rider app's "gates mode" auto-detect on app open.
  • systems/lyft-pickup-routing — Routing team's pickup-routing subsystem. Inserts the gate as an invisible intermediate waypoint for gated-community pickups, giving the driver app a precise UX timing anchor for surfacing gate instructions.
  • systems/lyft-rider-app — rider-facing mobile app; host of the "gates mode" dual inside/outside-gate pickup-spot selection UI and the intercom-style numpad + plain-language list for gate-instruction sharing.
  • systems/lyft-driver-app — driver-facing mobile app; host of the scannable gate-instruction banner timed off the routing waypoint, with screenshot prevention for code exfiltration control.

Protocol + schema design (mobile + backend)

  • systems/protobuf — Lyft Media canonicalises design practices for the shared-schema case (mobile + backend). Used extensively across both mobile-to-server and server-to-server traffic per Rebello's 2020 post.
  • systems/protoc-gen-validate (PGV) / protovalidate — Lyft Media's declarative validation layer over protobuf schemas. Plugin author is not Lyft but Lyft Media uses it as the standard validator.
  • systems/envoy — Lyft-originated L7 proxy; pre-existing wiki reference.

ML platform

ML data infra (feature store)

  • systems/lyft-feature-store"platform of platforms": three ingestion lanes (batch / streaming / direct-CRUD) with strongly-consistent reads + uniform metadata across lanes. Canonical second-major-co feature-store on the wiki after Dropbox Dash.
  • systems/lyft-dsfeatures — the unified online serving layer; wraps DynamoDB (persistent, GSI for GDPR deletion) + ValKey (write-through LRU cache) + OpenSearch (embeddings only). Exposes full CRUD via go-lyft-features + lyft-dsp-features SDKs.
  • systems/amundsen — Lyft's own open-source data-discovery platform; Feature Store DAGs automatically tag feature metadata here so engineers can find existing features before creating duplicates.
  • systems/apache-airflow — Astronomer-hosted; runs the auto-generated feature DAGs.
  • systems/apache-flink — streaming-feature lane, reading from Kafka (or sometimes Kinesis) and writing through the central spfeaturesingest Flink choke-point app.
  • systems/apache-spark + systems/apache-hive — SparkSQL as the batch-feature transformation language; Hive as the offline feature store.

Key patterns / concepts

Metric governance (2026-06-10 MSL post)

Mapping / pickup UX (2026-04-23 gated-community post)

Protobuf / protocol design

ML feature store (2026-01-06 Feature Store post)

LyftLearn 2.0 (2025-11-18 LyftLearn-evolution post)

Recent articles

  • 2026-06-10 — sources/2026-06-10-lyft-metric-semantic-layer (Rohit Channe & Simran Mirchandani, Lyft Engineering — Lyft's internal Metric Semantic Layer (MSL): a centralized, versioned Python package serving as the single source of truth for every "Golden Metric" definition. YAML configs with Jinja-templated SQL, exposed via Python API, Amundsen integration, self-service UI, and MCP server for AI agents. Governance via dual-owner model (Business Owner + Operational Owner, always teams) with mandatory dual approval for changes. Only metrics with ≥2 use cases qualify. Fifth Lyft source on the wiki and the first focused on metric governance architecture.)
  • 2026-04-23 — sources/2026-04-23-lyft-smarter-pickup-experience-for-gated-communities (Lyft Mapping team — an end-to-end rebuild of the pickup flow for gated communities, which make up 25–30% of Lyft rides in selected markets. Four-piece architecture: (1) gate-area shape generation from OSM + driver feedback; (2) dual inside/outside-gate pickup-spot selection UI with outside-gate spots sourced from historical pickup heatmaps; (3) routing inserts the gate as an invisible intermediate waypoint that doubles as a UX timing anchor; (4) intercom-style numpad for gate-code sharing + scannable banner on driver approach, with gate codes treated as ephemeral sensitive data (never stored between trips, audience of one, screenshot-blocked). ~95% positive rider survey response post-launch; lower rider + driver cancellation rates; less walking, shorter waits, fewer course changes. Named as the first instance of a repeatable playbook for physical-world constraints — generalises to road closures, unsafe curbs, etc. Fourth Lyft source on the wiki and the first with a mapping/routing focus.)
  • 2026-01-06 — sources/2026-01-06-lyft-feature-store-architecture-optimization-and-evolution (Rohan Varshney, Lyft Engineering — Lyft's Feature Store as a "platform of platforms" with three ingest lanes (batch / streaming / direct-CRUD) converging on a unified online-serving layer dsfeatures that wraps DynamoDB + ValKey write-through LRU cache + OpenSearch embedding store behind two CRUD SDKs. Batch lane: SparkSQL + JSON config → auto-generated Astronomer-hosted Airflow DAGs with built-in data-quality checks and Amundsen metadata tagging. Streaming lane: customer Flink apps read Kafka/Kinesis → central spfeaturesingest Flink ingest app writes to dsfeatures. Uniform metadata + strongly consistent reads invariant across lanes. GSI-for-GDPR on DynamoDB. Second major-tech-co feature-store instance on the wiki after Dropbox Dash.)
  • 2025-11-18 — sources/2025-11-18-lyft-lyftlearn-evolution-rethinking-ml-platform-architecture (Lyft ML Platform team — LyftLearn 2.0: compute-serving split moving training / batch / HPO / JupyterLab off Kubernetes LyftLearn onto SageMaker-based LyftLearn Compute, while real- time model serving stays on EKS LyftLearn Serving. Zero- ML-code-change migration as the hard constraint; achieved via cross-platform Docker base image compat layer replicating the Kubernetes environment on SageMaker.)
  • 2024-09-16 — sources/2024-09-16-lyft-protocol-buffer-design-principles-and-practices (Roman Kotenko, Lyft Media — distilled two principles + five practices for proto3 protobuf design: clarity + extensibility; reserve 0 as UNKNOWN; prefer oneof over enum-plus-field; name fields with their unit; use optional label / wrapper types for presence; declare validation inline with protoc-gen-validate; cross-entity constants via custom EnumValueOptions extensions. Validators must be invoked manually — they don't run on parse. First Lyft source on the wiki.)
Last updated · 542 distilled / 1,571 read