Skip to content

PATTERN Cited by 1 source

Hybrid batch + streaming + direct-write ingestion

Definition

Hybrid batch + streaming + direct-write ingestion is the architectural pattern of splitting feature (or more generally data) ingestion into three complementary lanes — each matching a different freshness/cost/complexity trade-off — instead of forcing all features through a single pipeline.

The three lanes:

  1. Batch — periodic, heavy joins/aggregations over historical windows, typically Spark-backed; hours- to-minutes freshness. Absorbs the high-volume transformations that can't run online.
  2. Streaming — unbounded near-real-time processing of events; seconds-to-minutes freshness. Handles signals that must reflect what users are doing right now.
  3. Direct writes — application-level writes straight into the online store, bypassing the ingestion pipeline entirely; seconds freshness. Escape hatch for lightweight or precomputed features.

Why three lanes and not one

Every single-path design eventually fails somewhere:

  • Batch-only misses the freshness SLO for interaction signals ("user opened doc 2s ago should surface in next search").
  • Streaming-only makes heavy joins/aggregations over historical windows expensive or impractical at scale.
  • Direct-write-only requires the application to compute every feature itself — defeats the purpose of a shared feature store.

The hybrid approach lets each feature hit the lane that matches its shape:

  • Signals that need hours of history but minutes of freshness → batch with change detection.
  • Signals that need seconds of freshness → streaming.
  • Signals computed by an adjacent pipeline (e.g. an LLM evaluation service) that just need to land in the store → direct writes.

Dropbox Dash realization

Canonical instance from the 2025-12-18 Dash feature-store post:

  • Batch ingestion on a medallion architecture (raw → refined layers), with intelligent change detection — see patterns/change-detection-ingestion. "Reduced write volumes from hundreds of millions to under one million records per run and cut update times from more than an hour to under five minutes."
  • Streaming ingestion "captures fast-moving signals such as collaboration activity or content interactions. By processing unbounded datasets in near-real time, it ensures features stay aligned with what users are doing in the moment."
  • Direct writes "handle lightweight or precomputed features by bypassing batch pipelines entirely. For example, relevance scores produced by a separate LLM evaluation pipeline can be written directly to the online store in seconds instead of waiting for the next batch cycle."

(Source: sources/2025-12-18-dropbox-feature-store-powering-real-time-ai-dash)

Relation to classic Lambda / Kappa architectures

  • Lambda architecture is the original "hot path + cold path" pattern (real-time + batch) reconciling to the same serving layer. Hybrid batch + streaming is the direct descendant.
  • Kappa architecture argued the batch path was unnecessary — everything through the stream. In practice, for heavy historical aggregations at the exabyte-adjacent scale of a ranking feature store, the batch path returns because streaming those transformations is uneconomical.
  • The direct-write lane is the Dropbox-named third lane — not a hot-vs-cold distinction but a "skip the pipeline entirely for features whose producer is already downstream of it" escape.

When to use it

Apply this pattern when:

  • Feature / data freshness requirements vary across features on the same substrate.
  • Heavy historical joins/aggregations can't fit in the streaming lane's cost/latency envelope.
  • An adjacent pipeline already produces some features and just needs a place to land them.

Don't apply it when:

  • All features have uniform freshness requirements — pick one lane.
  • The online store can't absorb direct writes from multiple producers without coordination (contention, quota, consistency issues).
  • The ingestion complexity tax (three pipelines + assignment rule per feature) isn't justified by the freshness/cost diversity.

Seen in

Last updated · 200 distilled / 1,178 read