Skip to content

SYSTEM Cited by 1 source

Lyft Feature Store

Lyft's Feature Store is the shared ML substrate every model and consuming service at Lyft retrieves features from. In Rohan Varshney's (2026-01-06) characterization, it is a "platform of platforms" — not a single pipeline but three complementary ingest/serve lanes (batch, streaming, direct-CRUD) composed on top of one unified online-serving layer, dsfeatures.

What it is concretely

  • Batch laneSpark SQL query + JSON config; a Python cron service auto-generates an Astronomer-hosted Airflow DAG per config that executes the query, writes to both offline (Hive) and online (dsfeatures) paths, runs data-quality checks, and tags Amundsen for discoverability. Typical cadence: daily.
  • Streaming lane — customer Flink applications read events from Kafka (sometimes Kinesis), transform, and emit feature payloads to the central spfeaturesingest Flink app — which owns (de)serialization and dsfeatures WRITE API interaction.
  • Direct-CRUD lanego-lyft-features (Go) and lyft-dsp-features (Python) SDKs expose full CRUD on dsfeatures so internal DAGs and customer services can read and write features ad-hoc without going through the ingestion pipelines.
  • Online serving layerdsfeatures, an optimized wrapper over DynamoDB (backing), ValKey (write- through LRU cache), and OpenSearch (embeddings).

Design invariants

  • Uniform metadata + strongly-consistent reads across all ingestion paths. "Regardless of the ingestion method (batch, streaming, or on-demand), the Feature Store maintains uniform metadata and strongly consistent reads." The streaming-lane choke-point (spfeaturesingest) is the enforcement mechanism: you can't have uniform metadata if every producer writes its own way.
  • Feature definition = SparkSQL query + JSON config. No DSL. The platform absorbs the boilerplate (DAG generation, data-quality checks, offline/online double-write, metadata tagging); customers own only feature-specific SQL + metadata.
  • Metadata-driven governance. Each JSON config carries ownership, urgency tier, carryover / rollup logic, explicit naming and data-typing, versioning semantics, lineage. Versioning rule is named: "If the SQL or expected feature behavior undergoes business logic changes, a version bump is expected."
  • Amundsen-first discoverability. Generated DAGs automatically tag feature metadata in Amundsen, so finding an existing feature is a search, not a tribal-knowledge exercise.

Relationship to LyftLearn

The Feature Store sits adjacent to LyftLearn / systems/lyftlearn-serving / systems/lyftlearn-compute — Lyft's ML training + serving platform. The Feature Store is the data substrate; LyftLearn is the compute substrate. Models train on feature-store output (Hive offline tables) and serve against dsfeatures online reads.

Seen in

Last updated · 319 distilled / 1,201 read