PATTERN Cited by 2 sources
Batch + streaming + on-demand feature serving¶
Definition¶
Batch + streaming + on-demand feature serving is the feature- platform shape in which three complementary ingestion lanes feed a single unified online-serving surface:
- Batch — periodic jobs (Spark SQL against a data lake / warehouse, via Airflow or equivalent) computing features on hour-to-day cadence and writing to both offline + online stores.
- Streaming — stateful stream processors (Flink, Kafka Streams, Spark Streaming) reading from Kafka / Kinesis and writing features to the online store in near-real-time (seconds).
- On-demand / direct-CRUD — an SDK-level entry point that lets callers directly create, read, update, and delete features on the online store without going through any pipeline. Escape hatch for features computed by adjacent systems or needing ad-hoc management.
All three lanes terminate at the same online-serving layer — one API, one consistency story, one metadata schema — and the serving layer is responsible for enforcing uniform metadata and consistent reads across lanes.
Why three lanes and a unified surface¶
- Batch is the only lane that economically runs heavy historical joins / aggregations. Trying to stream everything loses on cost at scale.
- Streaming is the only lane that hits the freshness SLO for interaction signals ("user opened document X 2s ago").
- On-demand CRUD is the only lane that lets an adjacent system (an LLM evaluation pipeline, a score producer, a trigger-driven write) participate without pretending to be a pipeline.
- A unified surface is what makes the pattern survive contact with real users. If each lane had its own API, consumers would have to pick one — and would pick wrong, because the right lane for a feature changes as the feature evolves.
This is a strict superset of the underlying hybrid batch + streaming ingestion pattern: that pattern is about ingestion paths; this pattern is about the serving surface. They compose — hybrid-batch-streaming describes how features arrive; this pattern describes how they leave.
Canonical instances¶
- Lyft Feature Store (2026-01-06 Varshney):
Feature Store is called a "platform
of platforms" with three lanes (batch, streaming, on-demand)
converging on
dsfeatures. Batch lane is Airflow + SparkSQL + Hive. Streaming lane is Flink reading from Kafka/Kinesis with a centralspfeaturesingestchoke point. On-demand lane is exposed via thego-lyft-features/lyft-dsp- featuresSDKs' full CRUD API. Uniform metadata + strongly consistent reads are invariant across lanes. - Dropbox Dash Feature Store (2025-12-18): batch (medallion + change-detection), streaming, direct-write (from an LLM evaluation pipeline). Same three-lane shape, different naming.
Design properties worth naming¶
- Unified metadata surface. The online serving layer is the
enforcement point for metadata uniformity — not the batch DAGs,
not the streaming apps, not the direct-CRUD callers. This is
usually implemented via a central ingest choke point for the
streaming lane (Lyft's
spfeaturesingest, which owns (de)serialization anddsfeaturesWRITE calls for all customer streaming apps). - Store routing is internal to the serving layer. Callers don't pick a backing store. The feature type drives the routing — embeddings to OpenSearch, meta/value data to DynamoDB-backed ValKey in Lyft's case. See patterns/wrapper-over-heterogeneous-stores-as-serving-layer.
- SDK-first, not REST-first. Both canonical instances ship SDKs (Lyft: Go + Python; Dropbox: internal Go client) rather than raw HTTP contracts, so batching, typing, and consistency guarantees can be expressed in the client library, not re-implemented per caller.
Comparison with simpler shapes¶
- Batch-only feature store — easy to build, misses freshness SLO, blocks real-time use cases.
- Streaming-only feature store — uneconomical for heavy history joins; pushes batch workload into streaming infra.
- Cache-aside feature access — every caller computes its own features on the read path. Maximum duplication; no governance.
The three-lanes-plus-unified-surface pattern is the default shape once all three problems matter at the same time: freshness, heavy history, and adjacent-producer integration.
Related¶
- patterns/hybrid-batch-streaming-ingestion — the ingestion half of this pattern (pre-existing in the wiki from the Dropbox Dash ingest).
- patterns/wrapper-over-heterogeneous-stores-as-serving-layer — the serving-layer internal composition that makes the unified surface feasible.
- concepts/feature-store
- concepts/feature-freshness
- systems/lyft-feature-store
- systems/lyft-dsfeatures
- systems/dash-feature-store
Seen in¶
- sources/2026-01-06-lyft-feature-store-architecture-optimization-and-evolution
— Lyft's "platform of platforms" instance; three lanes + a
dsfeaturesunified surface. - sources/2025-12-18-dropbox-feature-store-powering-real-time-ai-dash — Dropbox Dash instance; batch + streaming + direct-write.