Lyft¶
Lyft Engineering blog is a Tier-2 source on the sysdesign-wiki. Lyft runs a large Envoy-fronted ridesharing platform (Envoy itself originated at Lyft), heavy polyglot backend (Python + Go + Java) with iOS + Android mobile clients, and the usual infra surface — service meshes, feature flags, rider/driver matching, trip lifecycle state. The blog's historical strength has been systems-at-scale content (Envoy, Flyte, service-mesh operations) plus mobile networking and protocol design for mobile-to-server + server-to-server communication. The 2020 post by Michael Rebello on Lyft's mobile networking journey documents the company-wide adoption of protobuf for mobile traffic; the 2024-09-16 Lyft Media post that's first ingested here is a direct descendant of that adoption.
The 2025-11-18 and 2026-01-06 posts opened a second Lyft theme on
the wiki: ML platform + data infra. LyftLearn 2.0 is the ML
platform (split compute from serving — SageMaker for training,
Kubernetes/EKS for inference). The Lyft Feature Store is the data
substrate underneath both — a "platform of platforms" with three
ingestion lanes (batch / streaming / direct-CRUD) converging on
dsfeatures, a wrapper over DynamoDB +
ValKey + OpenSearch.
Key systems¶
Protocol + schema design (mobile + backend)¶
- systems/protobuf — Lyft Media canonicalises design practices for the shared-schema case (mobile + backend). Used extensively across both mobile-to-server and server-to-server traffic per Rebello's 2020 post.
- systems/protoc-gen-validate (PGV) / protovalidate — Lyft Media's declarative validation layer over protobuf schemas. Plugin author is not Lyft but Lyft Media uses it as the standard validator.
- systems/envoy — Lyft-originated L7 proxy; pre-existing wiki reference.
ML platform¶
- systems/lyftlearn / systems/lyftlearn-serving /
systems/lyftlearn-compute — Lyft's ML platform, split after
LyftLearn 2.0 into Kubernetes/EKS serving (
lyftlearn-serving) and SageMaker-based training/batch/notebooks (lyftlearn-compute).
ML data infra (feature store)¶
- systems/lyft-feature-store — "platform of platforms": three ingestion lanes (batch / streaming / direct-CRUD) with strongly-consistent reads + uniform metadata across lanes. Canonical second-major-co feature-store on the wiki after Dropbox Dash.
- systems/lyft-dsfeatures — the unified online serving layer;
wraps DynamoDB (persistent, GSI for GDPR
deletion) + ValKey (write-through LRU cache) +
OpenSearch (embeddings only). Exposes full
CRUD via
go-lyft-features+lyft-dsp-featuresSDKs. - systems/amundsen — Lyft's own open-source data-discovery platform; Feature Store DAGs automatically tag feature metadata here so engineers can find existing features before creating duplicates.
- systems/apache-airflow — Astronomer-hosted; runs the auto-generated feature DAGs.
- systems/apache-flink — streaming-feature lane, reading from
Kafka (or sometimes
Kinesis) and writing
through the central
spfeaturesingestFlink choke-point app. - systems/apache-spark + systems/apache-hive — SparkSQL as the batch-feature transformation language; Hive as the offline feature store.
Key patterns / concepts¶
Protobuf / protocol design¶
- concepts/clarity-over-efficiency-in-protocol-design — first of Lyft Media's two named principles for protobuf design
- concepts/extensibility-protocol-design — second principle;
prefer structures (
oneof, well-known types,stringIDs) that admit future additions - concepts/unknown-zero-enum-value — reserve
0asUNKNOWNon every enum - concepts/unit-suffix-field-naming —
payload_size_bytes,timestamp_ms_utc, not raw primitives - concepts/proto3-explicit-optional — use
optionallabel (proto3 ≥ 3.15) orgoogle.protobuf.*Valuewrappers for presence semantics on primitives - patterns/oneof-over-enum-plus-field — model variant messages
with
oneof, not with discriminator-enum + sibling optional fields - patterns/protobuf-validation-rules — declarative validation
inline in the
.proto; generated validators must be invoked explicitly - patterns/protobuf-cross-entity-constants — custom
EnumValueOptionsextensions to share literal constants between mobile and backend
ML feature store (2026-01-06 Feature Store post)¶
- concepts/feature-store — Lyft is the second canonical major-tech-co instance.
- concepts/feature-freshness — batch / streaming / on-demand lanes map to different freshness tiers; "ultra-low-latency" cache + "near-real-time" streaming path.
- concepts/write-through-cache — canonical example: ValKey
over DynamoDB inside
dsfeatures. - concepts/feature-discoverability — Amundsen as the Feature-Store discovery layer; DAGs tag metadata automatically.
- concepts/training-serving-boundary — feature-store shape as a boundary-crossing discipline (unifies feature values across the training + serving fleets).
- patterns/hybrid-batch-streaming-ingestion — second canonical instance of the pattern (after Dropbox Dash).
- patterns/config-driven-dag-generation — canonical instance: SparkSQL + JSON config → auto-generated Airflow DAG with production-ready data-quality + Amundsen tagging baked in.
- patterns/batch-plus-streaming-plus-ondemand-feature-serving — the "platform of platforms" three-lane serving shape.
- patterns/wrapper-over-heterogeneous-stores-as-serving-layer
— canonical instance:
dsfeatureswraps DynamoDB + ValKey + OpenSearch behind one SDK.
LyftLearn 2.0 (2025-11-18 LyftLearn-evolution post)¶
- concepts/hybrid-ml-platform-architecture — compute-serving split (SageMaker for training, EKS/Kubernetes for serving).
- concepts/zero-code-change-migration + patterns/zero-code-change-platform-migration
- concepts/environmental-parity
- concepts/container-entrypoint-compat-layer + patterns/cross-platform-base-image
- patterns/runtime-fetched-credentials-and-config
- patterns/warm-pool-zero-create-path
- patterns/decoupled-compute-and-serving-stacks
- patterns/model-registry-and-object-store-as-hybrid-glue
- concepts/cross-cluster-networking
- concepts/lazy-container-image-loading (systems/amazon-soci)
Recent articles¶
- 2026-01-06 — sources/2026-01-06-lyft-feature-store-architecture-optimization-and-evolution
(Rohan Varshney, Lyft Engineering — Lyft's Feature Store as a
"platform of platforms" with three ingest lanes (batch /
streaming / direct-CRUD) converging on a unified online-serving
layer
dsfeaturesthat wraps DynamoDB + ValKey write-through LRU cache + OpenSearch embedding store behind two CRUD SDKs. Batch lane: SparkSQL + JSON config → auto-generated Astronomer-hosted Airflow DAGs with built-in data-quality checks and Amundsen metadata tagging. Streaming lane: customer Flink apps read Kafka/Kinesis → centralspfeaturesingestFlink ingest app writes todsfeatures. Uniform metadata + strongly consistent reads invariant across lanes. GSI-for-GDPR on DynamoDB. Second major-tech-co feature-store instance on the wiki after Dropbox Dash.) - 2025-11-18 — sources/2025-11-18-lyft-lyftlearn-evolution-rethinking-ml-platform-architecture (Lyft ML Platform team — LyftLearn 2.0: compute-serving split moving training / batch / HPO / JupyterLab off Kubernetes LyftLearn onto SageMaker-based LyftLearn Compute, while real- time model serving stays on EKS LyftLearn Serving. Zero- ML-code-change migration as the hard constraint; achieved via cross-platform Docker base image compat layer replicating the Kubernetes environment on SageMaker.)
- 2024-09-16 — sources/2024-09-16-lyft-protocol-buffer-design-principles-and-practices
(Roman Kotenko, Lyft Media — distilled two principles + five
practices for proto3 protobuf design: clarity + extensibility;
reserve
0asUNKNOWN; preferoneofover enum-plus-field; name fields with their unit; useoptionallabel / wrapper types for presence; declare validation inline with protoc-gen-validate; cross-entity constants via customEnumValueOptionsextensions. Validators must be invoked manually — they don't run on parse. First Lyft source on the wiki.)