PATTERN Cited by 1 source
Kafka broadcast for shared state¶
When every node in a service cluster must converge on a shared, eventually-consistent view of state — and consensus or bespoke gossip would be overkill — publish each node's local updates to a Kafka topic that all cluster nodes consume. Every node thereby receives every other node's updates.
Shape¶
Node A Node B Node C Node D
| | | |
v v v v
+-------- Kafka broadcast topic (all nodes subscribe) --------+
^ ^ ^ ^
| | | |
Node A Node B Node C Node D
(each node consumes from the topic, including its own writes)
Each node:
- Receives some slice of work from upstream load balancing.
- Derives shared state from its slice of the work.
- Publishes deltas (or complete state fragments) to the broadcast topic.
- Consumes the topic and merges everyone's updates — including its own — into its local in-memory view.
When this is the right choice¶
- Convergence, not consensus. You want eventual consistency (merge the union of statements), not a totally-ordered log.
- Low update rate. Broadcast cost scales with O(updates × nodes).
- Existing Kafka. Reuse the org's Kafka rather than operating a separate gossip mesh / Raft cluster.
- State is idempotently mergeable. Each update is a self-contained statement that's safe to apply multiple times or out of order (e.g. a heartbeat time range).
When it's the wrong choice¶
- High update rate or large per-node state. Broadcast becomes a bandwidth tax.
- Strict consistency needed. Kafka is at-least-once; nodes may see different state briefly. Consensus is required when atomic transitions matter.
- No shared Kafka infrastructure. The "simple" framing depends on operational Kafka already being cheap.
Canonical instance¶
systems/netflix-flowcollector — each of 30 c7i.2xlarge nodes
sees some subset of the ~5M flows/sec and publishes learned
(ip, workload_id, t_start, t_end) time ranges to a broadcast
Kafka topic. Every node subscribes and merges peer time ranges
into its local per-IP ownership map. Verbatim acknowledgement from
the post: "Although more efficient broadcasting implementations
exist, the Kafka-based approach is simple and has worked well for
us."
Trade-offs explicitly chosen¶
- Bandwidth over latency. Every broadcast node sees every other node's write; cheaper than replicated consensus but O(N²) on cluster size.
- Simplicity over optimality. "We implemented a broadcasting mechanism using Kafka, where each node publishes learned time ranges to all other nodes." Netflix explicitly names the tradeoff.
- Kafka's ordering guarantees (per-partition) are sufficient when merges are order-independent.
Alternatives¶
- Custom gossip (SWIM, Corrosion, etc.). Purpose-built, lower overhead, but an operational burden.
- Raft / Paxos log. Over-strong for convergent state.
- Shared write-through cache (Redis, etc.). Only works for centralised state; introduces single-point dependency.
Seen in¶
- sources/2025-04-08-netflix-how-netflix-accurately-attributes-ebpf-flow-logs — canonical wiki instance.