Skip to content

PATTERN Cited by 1 source

Three-stage ingest-fusion-index pipeline

Problem

You have raw ML-model output (annotations, detections, embeddings) that needs to be searchable at second-granularity across many models of different modalities at media-catalog scale. Direct indexing of raw per-model output into a search engine fails because:

  • Every model's output shape is different; a flat index can't express cross-modality queries.
  • Intersection of continuous-interval annotations across modalities is O(M × N) — not tractable at query time.
  • Re-running a model must not create duplicate search documents; the index must stay a single source of truth.
  • Heavy intersection computation must not bottleneck real-time ingest throughput.

Solution

Split the pipeline into three decoupled stages glued by an event bus:

  1. Transactional persistence — raw per-model annotations land in a write-optimised store (e.g. Cassandra) through a dedicated annotation service. Data-integrity + high-throughput-write is the only job.
  2. Offline data fusion — an event from stage 1 triggers an asynchronous job that discretizes annotations into fixed-size time buckets (concepts/temporal-bucket-discretization) and computes cross-model intersections (concepts/multimodal-annotation-intersection). Enriched records go back to the durable store.
  3. Indexing for search — a subsequent event triggers ingestion of enriched bucket records into a search engine (e.g. Elasticsearch) as nested documents (concepts/nested-document-indexing) via composite-key upsert (concepts/composite-key-upsert) on (asset_id, time_bucket).

Each stage uses the tool optimised for its workload:

Stage Store Optimised for
1 — persist Cassandra high write throughput, durability
2 — fuse Cassandra (re-written enriched rows) bulk batch compute
3 — index Elasticsearch low-latency multimodal search

Why it works

  • Ingest stays fast — stage 1 doesn't wait for the heavy intersection math in stage 2. Netflix's design posture: "Cleanly decoupling these intensive processing tasks from the ingestion pipeline guarantees that complex data intersections never bottleneck real-time intake" (Source: sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search).
  • Each stage is resilient to downstream failure — stage 1 can accept writes while stage 2 is down; stage 2 can reprocess backlogs when stage 3 is down. The Kafka events in between absorb the transient decoupling.
  • Re-runs are safe — stage 3's composite-key upsert guarantees each (asset, bucket) Elasticsearch document is updated in place, not duplicated.
  • Query altitude matches storage — the search index carries the fused / denormalised shape optimised for multimodal queries; the transactional store carries the raw per-model shape optimised for ingest + re-processing.

Canonical instance

Netflix Search's multimodal video-search pipeline (sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search):

Adjacent patterns

Caveats

  • Three stages is three sources of eventual consistency; search results lag ingest by (fusion-job-delay + index-refresh-lag).
  • Multi-stage pipelines are harder to debug than monolithic ingest paths — observability must cover the event bus, each stage's health, and end-to-end traces across all three.
  • Netflix doesn't disclose the fusion-job scheduling cadence, back-pressure strategy when stage 3 falls behind, or the behaviour when a model re-emits annotations for a historical bucket.
  • Model-version-aware semantics in stage 2 are open — does a new version of a model replace or augment prior annotations for an (asset, bucket)?
Last updated · 319 distilled / 1,201 read