CONCEPT Cited by 1 source
Discrete event vs heartbeat attribution¶
Two structural shapes for maintaining an owner mapping that
changes over time in a distributed system:
Discrete-event attribution¶
A control plane emits a stream of assign(owner, resource, t) and
unassign(resource, t) events. Consumers apply each event to their
in-memory resource → owner map and query the map at attribution
time.
What it requires:
- Complete event delivery.
- Correct ordering.
- Accurate timestamps.
Failure modes:
- Delayed events keep stale entries live → misattribution.
- Lost events → misattribution or the resource disappears from the map.
- Out-of-order events → consumer applies them in the wrong order → misattribution.
- Inaccurate event timestamps → even with ordering, wrong window.
- Mitigations like hold-back buffers trade freshness for reduced misattribution without eliminating it.
Canonical wiki instance: Sonar in Netflix's original eBPF flow-log pipeline; ~40% of Zuul's reported dependencies were misattributed.
Heartbeat attribution¶
Every data-plane observation is simultaneously a heartbeat
(owner, t_start, t_end) for a resource. The consumer builds a
per-resource time-range map; attribution at time t is a
time-range lookup.
What it requires:
- Heartbeats happen frequently enough to cover meaningful intervals.
- Attribution-query timestamps are reliable (e.g. Amazon Time Sync).
Failure modes, and why they're benign:
- Missed heartbeats leave unattributed intervals, not misattributed ones.
- Out-of-order arrival is fine — each heartbeat is a self-contained statement.
- Consumer restart is fine — state rebuilds from new heartbeats within minutes.
Canonical wiki instance: systems/netflix-flowcollector in Netflix's 2025 redesign. Validation against Zuul over two weeks: zero misattribution vs 40% before.
The core tradeoff¶
Discrete-event attribution privileges coverage (every event produces an update) at the cost of correctness (delivery / ordering / timestamps must be right). Heartbeat attribution privileges correctness (stale or missing data is visible as gaps) at the cost of coverage (a short window of each resource's life may be uncovered).
When a single misattribution is more expensive than many unattributed records — which is true for service-dependency mapping, security analysis, and most compliance use cases — heartbeat attribution is the right shape. Netflix's verbatim statement: "For our use cases, it is acceptable to leave a small percentage of flows unattributed, but any misattribution is unacceptable." (See patterns/accept-unattributed-flows.)
When discrete-event is still right¶
Heartbeat attribution is only possible when the data plane produces heartbeats. Some endpoints (AWS ELBs, managed cloud services) cannot run a heartbeat-producing agent; for them a discrete-event fallback is the only option, and the delay/ordering caveats are tolerable when changes are rare. Netflix's hybrid approach keeps Sonar in the pipeline for ELB IPs only.
Seen in¶
- sources/2025-04-08-netflix-how-netflix-accurately-attributes-ebpf-flow-logs — canonical wiki instance; the redesign is the before/after A/B for the two approaches, with the 40% → 0 misattribution datum as the load-bearing evidence.
Related¶
- concepts/heartbeat-based-ownership — the heartbeat-side data structure.
- concepts/ip-attribution — the domain.
- systems/netflix-sonar · systems/netflix-flowcollector
- patterns/heartbeat-derived-ip-ownership-map