PATTERN Cited by 1 source
Offline fusion via event bus¶
Problem¶
A write-heavy transactional store persists data that needs complex cross-record fusion (temporal intersection, aggregation, enrichment) before it can be indexed for search or query. Doing the fusion synchronously on the ingest hot path violates the write-path's latency budget:
- Intersection across many related records is
O(N²)in the worst case; the ingest store isn't optimised for that shape. - Ingest throughput collapses if a single write waits for fusion of many prior records.
- Transient fusion failures cascade back to producers and reject new writes.
Solution¶
Emit an event on every ingest completion and let an independent consumer service perform fusion out-of-band. The event bus (e.g. Kafka) provides:
- Decoupling — ingest acknowledges durability once the event is published; fusion runs later.
- Back-pressure absorption — the bus holds events while fusion is slow or down.
- Re-processing — the bus can replay events for backfill or fusion-logic change.
Canonical instance¶
Netflix's multimodal video-search pipeline (sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search):
"Once the annotation service securely persists the raw data, the system publishes an event via Apache Kafka to trigger an asynchronous processing job. Serving as the architecture's central logic layer, this offline pipeline handles the heavy computational lifting out-of-band … decoupling these intensive processing tasks from the ingestion pipeline guarantees that complex data intersections never bottleneck real-time intake."
- Producer: systems/netflix-marken (annotation service) writes to systems/apache-cassandra → publishes a Kafka event.
- Consumer: offline fusion job discretizes annotations into time buckets and computes cross-model intersections.
- Output: enriched bucket records go back to Cassandra as distinct entities, then a second event triggers indexing into Elasticsearch.
Why it works¶
- Ingest latency stays bounded by durable persistence, not by any downstream compute.
- Fusion can be horizontally scaled independently of ingest capacity.
- Fusion failures don't cascade to ingest; they manifest as consumer lag on the bus, visible and bounded.
- Idempotent consumers (via composite-key upsert at the downstream index, plus event-key deduplication in the consumer) make replays safe.
Sibling patterns¶
- patterns/fire-and-forget-rollup-trigger — structurally identical; the write path posts a light-weight trigger to the rollup tier instead of computing inline. Netflix Distributed Counter instance.
- patterns/three-stage-ingest-fusion-index — the bigger shape this sits inside.
Caveats¶
- The event bus is now a critical-path dependency — Kafka's availability gates fusion's timeliness.
- Consumer lag = staleness of downstream derived data. Query- time freshness contracts must reflect this.
- Netflix doesn't disclose the Kafka topology (partition count, retention, ordering guarantees) or fusion-consumer scaling characteristics in the 2026-04-04 post.
- Event semantics (at-least-once vs exactly-once) matter for consumer idempotency; Netflix's reliance on composite-key upsert at the index tier is the corresponding at-least-once- safe strategy.