Skip to content

CONCEPT Cited by 1 source

IP attribution

IP attribution is the problem of mapping a packet's or flow's IP address (and the time at which that IP was in use) to the workload identity that owned it at that moment. It is load-bearing for service-topology observability, security analysis, incident triage, and dependency auditing in cloud environments where IPs are reassigned as workloads come and go.

Why the problem exists

In cloud environments, IP addresses are ephemeral: an IP assigned to workload X at time t₀ may be reassigned to workload Y at time t₁. Flow logs captured at the packet layer only carry IP addresses — not workload identities — so correct attribution requires a separate mapping plus correct handling of the time dimension.

A single misattributed flow produces an incorrect workload dependency. At fleet scale this compounds: Netflix observed ~40% of Zuul's reported dependencies were misattributed under an event-based attribution pipeline.

Two architectural approaches

Event-based (discrete-event) attribution

Subscribe to a stream of IP assignment / unassignment events; apply each event to an in-memory ip → workload map; look up the map at attribution time. Failure mode: any delay, loss, or reordering in the event stream produces stale entries that cause misattribution. Mitigations (e.g. a hold-back buffer) trade freshness for reduced misattribution but cannot eliminate it. Canonical wiki instance: Sonar.

Heartbeat-based (time-range) attribution

Every data-plane observation is simultaneously a heartbeat: "this IP belonged to this workload from t_start to t_end." Maintain a per-IP list of time ranges. Look up attribution by the flow's timestamp. Missed heartbeats leave an IP unattributed during the missed window but cannot misattribute it.

See concepts/discrete-event-vs-heartbeat-attribution for the detailed tradeoff framing, concepts/heartbeat-based-ownership for the data structure, and patterns/heartbeat-derived-ip-ownership-map for the canonical architecture.

Local vs. remote attribution

On any host, the local IP is easier than remote: the host knows its own identity, so local attribution can happen at capture time (at capture, in-kernel, from an eBPF map or provisioned cert). Remote attribution requires consulting a shared data structure populated by other hosts. Canonical split: see FlowExporter (local) + FlowCollector (remote) in the Netflix 2025 redesign.

Special cases

  • NAT'd sockets. Shared-IP translation schemes (e.g. IPv6-to-IPv4 without NAT64) need (IP, port) as the attribution key instead of IP alone.
  • Non-workload endpoints. Addresses where no host can run a flow-exporter (cloud-provider-managed LBs, external dependencies) cannot be heartbeat-attributed; hybrid pipelines keep an event-based fallback for those.
  • Cross-regional flows. Remote IP may belong to a workload in a different region. Options: global broadcast of time ranges (expensive) or forward the attribution request to the peer region (CIDR trie).

Seen in

Last updated · 319 distilled / 1,201 read