Skip to content

PATTERN Cited by 1 source

Network intermediary flow resolution

Network intermediary flow resolution is the pattern of combining the inbound and outbound flow records of a network intermediary (load balancer, NAT gateway, API gateway, proxy) to reconstruct the direct application-to-application edge that engineers want to see in a service dependency graph — instead of recording two separate edges App A → Intermediary and Intermediary → App B.

The wiki's canonical instance is Stage 2 of Netflix Service Topology's three-stage flow aggregation pipeline.

The shape

Raw flow records (network capture):
   App A ─→ Load Balancer    [ flow record 1 ]
              └→ App B       [ flow record 2 ]

After resolution (in the topology graph):
   App A ─→ App B            [ single edge ]

The two raw flow records are joined by the resolver, the intermediary is erased from the engineer-visible graph, and a single application-to-application edge is stored.

When to apply

  • The capture substrate is network-level (eBPF flow logs, NetFlow, sFlow, packet capture) — i.e. it sees individual hops rather than full request paths.
  • The graph being built is a service-dependency graph intended for engineer-facing dependency questions, not network-engineer questions.
  • The fleet contains a small number of intermediaries serving a large number of service pairs — i.e. the intermediary skew exists.

When NOT to apply

  • The intermediary is itself a service of interest. (Resolving it away would hide a dependency engineers need to see.)
  • The substrate is request-level (distributed tracing) — traces already encode application-to-application call paths and don't need this resolution step.

Mechanics

Identifying intermediaries

The resolver needs a registry / classifier that says "IP X is a network intermediary, not an application." Likely sources:

  • AWS metadata (ELB / NAT GW IP ranges / API GW endpoints).
  • Internal-LB / proxy registry maintained by the platform team.
  • Heartbeat-based ownership (concepts/heartbeat-based-ownership) — applications that emit heartbeats are known to be applications; IPs that don't are candidates for being intermediaries (combined with positive identification from the registry).

The Netflix post enumerates four intermediary classes — load balancers, NAT gateways, API gateways, proxies — but does not decompose the identification mechanism.

Joining inbound and outbound flows

Once an intermediary is identified, the resolver must correlate this inbound flow on the intermediary with the corresponding outbound flow from the intermediary. Possible join keys:

  • Connection state from the intermediary itself (if the intermediary exports its own flow logs with cross-side correlation IDs).
  • 5-tuple matching with timing constraints.
  • Session affinity / load-balancer-pinning state.

The Netflix post does not name the join algorithm directly. The 2025-04-08 sibling post on FlowExporter / FlowCollector discloses related attribution mechanics (heartbeat-based IP ownership; CIDR-trie cross-region forwarding) but at the attribution layer rather than the edge-collapse layer.

Output: a single edge

The output of resolution is a single direct edge in the topology graph, with the intermediary erased from the visible graph but typically retained as edge metadata (so a query "how does App A actually reach App B?" can still return "via Load Balancer X" if the engineer asks for path detail).

Why this stage is hot-spot-prone

A handful of intermediaries see "100x more traffic than others" (verbatim from the source). A naive per-intermediary aggregator becomes the bottleneck. The patterns/three-stage-flow-aggregation-pipeline pattern responds by bracketing this stage between cheaper Stages 1 and 3 with independent partitioning, so the hot intermediary's load doesn't dictate the whole pipeline's partitioning scheme.

Sibling: the attribution-layer intermediary problem

Netflix has a related-but-distinct intermediary problem at the flow-attribution layer: ELB IPs cannot be heartbeat-attributed because FlowExporter can't run on an ELB. The attribution layer's response is to fall back to a discrete-event source (Sonar) for those IPs. (sources/2025-04-08-netflix-how-netflix-accurately-attributes-ebpf-flow-logs)

The two intermediary problems compose:

  1. Attribution layer"who is this intermediary IP?" — answered via Sonar fallback.
  2. Topology layer (this pattern)"how do I erase this intermediary from the engineer-visible graph?" — answered via intermediary flow resolution.

The attribution decides what the intermediary is; this pattern decides how to erase it.

Generalisation: collapse middlemen on logical-layer graphs

The general shape: graph captured at network layer, displayed at application layer, with N hops collapsed into 1. The pattern is not specific to load balancers — it applies to any pass-through component whose role in the application graph is "be invisible":

  • Forward proxies in egress paths.
  • Sidecars (Envoy, Linkerd) that double the hop count.
  • API gateways in front of microservice clusters.
  • VPN concentrators in cross-region traffic.

In all cases, the engineer wants to see "App A → App B"; the network capture sees "App A → middleman → App B".

Seen in

Last updated · 542 distilled / 1,571 read