Skip to content

Lyft

Lyft Engineering blog is a Tier-2 source on the sysdesign-wiki. Lyft runs a large Envoy-fronted ridesharing platform (Envoy itself originated at Lyft), heavy polyglot backend (Python + Go + Java) with iOS + Android mobile clients, and the usual infra surface — service meshes, feature flags, rider/driver matching, trip lifecycle state. The blog's historical strength has been systems-at-scale content (Envoy, Flyte, service-mesh operations) plus mobile networking and protocol design for mobile-to-server + server-to-server communication. The 2020 post by Michael Rebello on Lyft's mobile networking journey documents the company-wide adoption of protobuf for mobile traffic; the 2024-09-16 Lyft Media post that's first ingested here is a direct descendant of that adoption.

The 2025-11-18 and 2026-01-06 posts opened a second Lyft theme on the wiki: ML platform + data infra. LyftLearn 2.0 is the ML platform (split compute from serving — SageMaker for training, Kubernetes/EKS for inference). The Lyft Feature Store is the data substrate underneath both — a "platform of platforms" with three ingestion lanes (batch / streaming / direct-CRUD) converging on dsfeatures, a wrapper over DynamoDB + ValKey + OpenSearch.

Key systems

Protocol + schema design (mobile + backend)

  • systems/protobuf — Lyft Media canonicalises design practices for the shared-schema case (mobile + backend). Used extensively across both mobile-to-server and server-to-server traffic per Rebello's 2020 post.
  • systems/protoc-gen-validate (PGV) / protovalidate — Lyft Media's declarative validation layer over protobuf schemas. Plugin author is not Lyft but Lyft Media uses it as the standard validator.
  • systems/envoy — Lyft-originated L7 proxy; pre-existing wiki reference.

ML platform

ML data infra (feature store)

  • systems/lyft-feature-store"platform of platforms": three ingestion lanes (batch / streaming / direct-CRUD) with strongly-consistent reads + uniform metadata across lanes. Canonical second-major-co feature-store on the wiki after Dropbox Dash.
  • systems/lyft-dsfeatures — the unified online serving layer; wraps DynamoDB (persistent, GSI for GDPR deletion) + ValKey (write-through LRU cache) + OpenSearch (embeddings only). Exposes full CRUD via go-lyft-features + lyft-dsp-features SDKs.
  • systems/amundsen — Lyft's own open-source data-discovery platform; Feature Store DAGs automatically tag feature metadata here so engineers can find existing features before creating duplicates.
  • systems/apache-airflow — Astronomer-hosted; runs the auto-generated feature DAGs.
  • systems/apache-flink — streaming-feature lane, reading from Kafka (or sometimes Kinesis) and writing through the central spfeaturesingest Flink choke-point app.
  • systems/apache-spark + systems/apache-hive — SparkSQL as the batch-feature transformation language; Hive as the offline feature store.

Key patterns / concepts

Protobuf / protocol design

ML feature store (2026-01-06 Feature Store post)

LyftLearn 2.0 (2025-11-18 LyftLearn-evolution post)

Recent articles

  • 2026-01-06 — sources/2026-01-06-lyft-feature-store-architecture-optimization-and-evolution (Rohan Varshney, Lyft Engineering — Lyft's Feature Store as a "platform of platforms" with three ingest lanes (batch / streaming / direct-CRUD) converging on a unified online-serving layer dsfeatures that wraps DynamoDB + ValKey write-through LRU cache + OpenSearch embedding store behind two CRUD SDKs. Batch lane: SparkSQL + JSON config → auto-generated Astronomer-hosted Airflow DAGs with built-in data-quality checks and Amundsen metadata tagging. Streaming lane: customer Flink apps read Kafka/Kinesis → central spfeaturesingest Flink ingest app writes to dsfeatures. Uniform metadata + strongly consistent reads invariant across lanes. GSI-for-GDPR on DynamoDB. Second major-tech-co feature-store instance on the wiki after Dropbox Dash.)
  • 2025-11-18 — sources/2025-11-18-lyft-lyftlearn-evolution-rethinking-ml-platform-architecture (Lyft ML Platform team — LyftLearn 2.0: compute-serving split moving training / batch / HPO / JupyterLab off Kubernetes LyftLearn onto SageMaker-based LyftLearn Compute, while real- time model serving stays on EKS LyftLearn Serving. Zero- ML-code-change migration as the hard constraint; achieved via cross-platform Docker base image compat layer replicating the Kubernetes environment on SageMaker.)
  • 2024-09-16 — sources/2024-09-16-lyft-protocol-buffer-design-principles-and-practices (Roman Kotenko, Lyft Media — distilled two principles + five practices for proto3 protobuf design: clarity + extensibility; reserve 0 as UNKNOWN; prefer oneof over enum-plus-field; name fields with their unit; use optional label / wrapper types for presence; declare validation inline with protoc-gen-validate; cross-entity constants via custom EnumValueOptions extensions. Validators must be invoked manually — they don't run on parse. First Lyft source on the wiki.)
Last updated · 319 distilled / 1,201 read