SYSTEM Cited by 1 source
Lyft Feature Store¶
Lyft's Feature Store is the shared ML substrate every model and
consuming service at Lyft retrieves features from. In Rohan
Varshney's (2026-01-06) characterization, it is a "platform of
platforms" — not a single pipeline but three complementary
ingest/serve lanes (batch, streaming, direct-CRUD) composed on top
of one unified online-serving layer,
dsfeatures.
What it is concretely¶
- Batch lane — Spark SQL query + JSON
config; a Python cron service
auto-generates an
Astronomer-hosted Airflow DAG per config that executes the query,
writes to both offline (Hive) and online
(
dsfeatures) paths, runs data-quality checks, and tags Amundsen for discoverability. Typical cadence: daily. - Streaming lane — customer
Flink applications read events from
Kafka (sometimes
Kinesis), transform,
and emit feature payloads to the central
spfeaturesingestFlink app — which owns (de)serialization anddsfeaturesWRITE API interaction. - Direct-CRUD lane —
go-lyft-features(Go) andlyft-dsp-features(Python) SDKs expose full CRUD ondsfeaturesso internal DAGs and customer services can read and write features ad-hoc without going through the ingestion pipelines. - Online serving layer —
dsfeatures, an optimized wrapper over DynamoDB (backing), ValKey (write- through LRU cache), and OpenSearch (embeddings).
Design invariants¶
- Uniform metadata + strongly-consistent reads across all
ingestion paths. "Regardless of the ingestion method (batch,
streaming, or on-demand), the Feature Store maintains uniform
metadata and strongly consistent reads." The streaming-lane
choke-point (
spfeaturesingest) is the enforcement mechanism: you can't have uniform metadata if every producer writes its own way. - Feature definition = SparkSQL query + JSON config. No DSL. The platform absorbs the boilerplate (DAG generation, data-quality checks, offline/online double-write, metadata tagging); customers own only feature-specific SQL + metadata.
- Metadata-driven governance. Each JSON config carries ownership, urgency tier, carryover / rollup logic, explicit naming and data-typing, versioning semantics, lineage. Versioning rule is named: "If the SQL or expected feature behavior undergoes business logic changes, a version bump is expected."
- Amundsen-first discoverability. Generated DAGs automatically tag feature metadata in Amundsen, so finding an existing feature is a search, not a tribal-knowledge exercise.
Related patterns on this wiki¶
- Feature store — the concept; Lyft is the second major instance on the wiki alongside Dropbox Dash.
- Hybrid batch + streaming ingestion — canonical instance: the three-lanes shape maps one-to-one to Dropbox Dash's batch + streaming + direct-write lanes.
- Config-driven DAG generation — canonical instance.
- patterns/wrapper-over-heterogeneous-stores-as-serving-layer
— canonical instance via
dsfeatures. - patterns/batch-plus-streaming-plus-ondemand-feature-serving — the three-lanes-with-unified-online-surface shape.
Relationship to LyftLearn¶
The Feature Store sits adjacent to
LyftLearn / systems/lyftlearn-serving /
systems/lyftlearn-compute — Lyft's ML training + serving
platform. The Feature Store is the data substrate; LyftLearn
is the compute substrate. Models train on feature-store
output (Hive offline tables) and serve against dsfeatures
online reads.
Seen in¶
- sources/2026-01-06-lyft-feature-store-architecture-optimization-and-evolution
— canonical wiki introduction; Rohan Varshney's Lyft Engineering
post describing the three-lane architecture,
dsfeaturesonline layer, governance model, and Amundsen integration.
Related¶
- systems/lyft-dsfeatures
- systems/lyftlearn
- systems/apache-airflow
- systems/apache-flink
- systems/apache-spark
- systems/apache-hive
- systems/kafka
- systems/amazon-kinesis-data-streams
- systems/amundsen
- companies/lyft
- concepts/feature-store
- concepts/feature-freshness
- concepts/write-through-cache
- concepts/feature-discoverability
- patterns/hybrid-batch-streaming-ingestion
- patterns/config-driven-dag-generation
- patterns/batch-plus-streaming-plus-ondemand-feature-serving
- patterns/wrapper-over-heterogeneous-stores-as-serving-layer