Skip to content

PATTERN Cited by 1 source

Offline fusion via event bus

Problem

A write-heavy transactional store persists data that needs complex cross-record fusion (temporal intersection, aggregation, enrichment) before it can be indexed for search or query. Doing the fusion synchronously on the ingest hot path violates the write-path's latency budget:

  • Intersection across many related records is O(N²) in the worst case; the ingest store isn't optimised for that shape.
  • Ingest throughput collapses if a single write waits for fusion of many prior records.
  • Transient fusion failures cascade back to producers and reject new writes.

Solution

Emit an event on every ingest completion and let an independent consumer service perform fusion out-of-band. The event bus (e.g. Kafka) provides:

  • Decoupling — ingest acknowledges durability once the event is published; fusion runs later.
  • Back-pressure absorption — the bus holds events while fusion is slow or down.
  • Re-processing — the bus can replay events for backfill or fusion-logic change.

Canonical instance

Netflix's multimodal video-search pipeline (sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search):

"Once the annotation service securely persists the raw data, the system publishes an event via Apache Kafka to trigger an asynchronous processing job. Serving as the architecture's central logic layer, this offline pipeline handles the heavy computational lifting out-of-band … decoupling these intensive processing tasks from the ingestion pipeline guarantees that complex data intersections never bottleneck real-time intake."

Why it works

  • Ingest latency stays bounded by durable persistence, not by any downstream compute.
  • Fusion can be horizontally scaled independently of ingest capacity.
  • Fusion failures don't cascade to ingest; they manifest as consumer lag on the bus, visible and bounded.
  • Idempotent consumers (via composite-key upsert at the downstream index, plus event-key deduplication in the consumer) make replays safe.

Sibling patterns

Caveats

  • The event bus is now a critical-path dependency — Kafka's availability gates fusion's timeliness.
  • Consumer lag = staleness of downstream derived data. Query- time freshness contracts must reflect this.
  • Netflix doesn't disclose the Kafka topology (partition count, retention, ordering guarantees) or fusion-consumer scaling characteristics in the 2026-04-04 post.
  • Event semantics (at-least-once vs exactly-once) matter for consumer idempotency; Netflix's reliance on composite-key upsert at the index tier is the corresponding at-least-once- safe strategy.
Last updated · 319 distilled / 1,201 read