PATTERN Cited by 1 source

Heartbeat-derived IP ownership map¶

Maintain a per-IP list of non-overlapping (workload_id, t_start, t_end) time ranges populated entirely from data-plane heartbeats. Attribution of a flow's remote IP is a time-range lookup keyed on the flow's start timestamp. No event-stream from a control plane is required for the workload-IP axis.

Structure¶

class IPOwnershipMap {
    Map<IPAddr, SortedList<TimeRange>> map;   // in-memory, per-node
    struct TimeRange {
        WorkloadId owner;
        Timestamp t_start;
        Timestamp t_end;    // extended by each heartbeat
    }
}

Invariants:

Ranges per IP are sorted ascending by t_start and non- overlapping (an IP can only belong to one workload at a time).
Every arriving heartbeat either extends the trailing range (same owner) or appends a new range (different owner).

When to use¶

You can observe or produce a steady stream of (resource, owner, t_start, t_end) tuples from the data plane — i.e. every business operation already implies ownership at its moment of capture.
Misattribution is more costly than unattribution.
You want the attribution service to be stateless enough to cold-start without persistent storage.

When not to use¶

Heartbeat frequency is too low to produce useful ownership coverage (e.g. ownership changes faster than heartbeats).
The resource cannot be observed at the endpoint that owns it (e.g. AWS ELBs from outside the ELB layer). For these, keep an event-based fallback.

Canonical example¶

systems/netflix-flowcollector in Netflix's 2025 eBPF flow-log attribution redesign. Every TCP flow close emitted by systems/netflix-flowexporter carries (local_ip, local_workload_id, t_start, t_end) — each such record is simultaneously a business flow log and a heartbeat extending the local IP's current-owner time range in FlowCollector's map. Remote IPs are attributed by looking up their map and picking the time range whose interval contains the flow's t_start. 5M flows/sec processed on 30 c7i.2xlarge instances with no persistent storage; 2-week Zuul validation window showed zero misattribution vs. ~40% under the prior event-based design.

concepts/heartbeat-based-ownership — the underlying data structure + invariants.
concepts/discrete-event-vs-heartbeat-attribution — the structural comparison vs. event-stream attribution.
patterns/sidecar-ebpf-flow-exporter — the producer of heartbeats at the data plane.
patterns/kafka-broadcast-for-shared-state — how nodes share their per-IP time ranges cluster-wide.
patterns/accept-unattributed-flows — the design posture that makes the correctness-over-coverage tradeoff explicit.

Trade-offs¶

Latency: attribution cannot happen until heartbeats covering the lookup window have arrived. Netflix buffers flows for 1 minute on disk to wait for the remote FlowExporter's next batch; the pre-redesign discrete-event system had a 15-minute holdback.
Coverage gaps: an IP's very first flow heartbeat is unattributed on the receiving node until a peer broadcasts a time range that covers it. Netflix retries after a delay and gives up.
Cost: in-memory state scales linearly with active IPs × recent time window; Netflix processes 5M flows/sec on 30 c7i.2xlarge.
Cold start: disposable — new node rebuilds its map from incoming flows + Kafka-broadcast backlog within minutes.

Seen in¶

sources/2025-04-08-netflix-how-netflix-accurately-attributes-ebpf-flow-logs — canonical wiki instance; the headline architectural move of the post.