Skip to content

PATTERN Cited by 1 source

Three-stage flow aggregation pipeline

Three-stage flow aggregation pipeline is the architectural pattern of decomposing a large-volume flow-log aggregation into three sequential stages, each separately partitioned, where the middle stage applies the most expensive (hot-spot-prone) logic and the outer stages absorb the bulk of throughput. The middle stage's hot-node load is diffused across the pipeline by the surrounding partition boundaries.

The wiki's canonical instance is Netflix's Service Topology — ingesting "millions of flow records per second" from multi-region Kafka, with Stage 2 doing network-intermediary resolution (the operation most prone to hot spots when a single load balancer fronts thousands of services).

Pattern shape

Multi-region Kafka  ─────►  Stage 1  ─────►  Stage 2  ─────►  Stage 3  ─────►  Storage
(flow records)              initial            intermediary    final
                            aggregation        resolution      aggregation +
                            from Kafka         (LBs / NATs /   health-status
                                                API GWs /      integration
                                                proxies →
                                                direct edges)
                            ↑──── Pekko Streams (Reactive Streams backpressure) ────↑
                            ↑──── Auto-scaling-group partitioned ──────────────────↑

Three stages, independently partitioned, with explicit backpressure between them.

Why three stages, not one

A single-stage aggregator that does "consume Kafka + resolve intermediaries + integrate health + write graph" in one operator suffers from two problems:

  1. Hot-spot load. A handful of network intermediaries (load balancers, NAT gateways) see "100x more traffic than others". The intermediary-resolution operator becomes the bottleneck. With a single-stage pipeline, the bottleneck holds back the whole pipeline.
  2. Coupled scaling. Initial Kafka aggregation, intermediary resolution, and health-status integration have very different resource profiles (CPU vs memory vs network) and very different throughput envelopes. A single-stage operator picks the worst of the three.

The Netflix post names the hot-spot diffusion explicitly:

"This graduated approach also prevents hot spots by distributing load across multiple points even when specific applications or network intermediaries see 100x more traffic than others." (sources/2026-05-29-netflix-from-silos-to-service-topology-why-netflix-built-a-real-time-service-map)

Stage decomposition

Stage 1 — initial aggregation from Kafka

Consumes raw flow records from Kafka. Performs first-pass aggregation (deduplication, time-window batching, schema canonicalisation). Output is a stream of partial aggregates keyed by something close to flow-shape (e.g. (local_workload, remote_ip, time_window)).

This stage's load is dominated by Kafka consumption and scales roughly linearly with flow record rate.

Stage 2 — network-intermediary resolution

The semantic-translation stage. Identifies network intermediaries (load balancers, NAT gateways, API gateways, proxies) and combines their incoming and outgoing flows to reconstruct the direct application-to-application path engineers actually want to see in the topology graph. (concepts/network-intermediary-resolution)

This stage's load is dominated by intermediary state and scales with the number of distinct intermediary endpoints + the join logic complexity. It is the most hot-spot-prone stage — which is why it is not the first or the last stage.

Stage 3 — final aggregation + enrichment

The pre-persistence stage. Performs final aggregation across Stage 2 outputs and integrates health-status data before graph persistence. The graph that gets written carries health overlay, ready for consumption.

This stage's load is dominated by enrichment lookups and is typically I/O-bound (health-status sources). Its partitioning is independent of Stage 2's, so a hot intermediary in Stage 2 doesn't sink Stage 3.

Distribution substrate

Netflix uses Apache Pekko Streams (Akka fork) running on Auto Scaling Groups:

"We use Apache Pekko Streams (a fork of Akka) to process these flows in a distributed, fault-tolerant pipeline. The system automatically partitions work across our Auto Scaling Groups to handle the volume and provides natural backpressure handling."

The Pekko / Reactive-Streams choice is load-bearing because:

  • Backpressure — every stage signals demand to its upstream stage; a slow Stage 2 throttles Stage 1 without unbounded buffering.
  • Auto-partitioning — work distribution across ASG nodes is handled by the framework rather than by application code.
  • Fault tolerance — failures in one ASG instance don't stall the whole pipeline.

Why the middle stage is the most expensive

Two structural properties:

  1. It has the most state per key. Joining inbound and outbound flows of an intermediary requires maintaining per-intermediary connection-state until the join key arrives.
  2. The keys are highly skewed. A few intermediaries see millions of flows; most "intermediaries" see one. The skew compounds the state pressure.

The pattern's response is to bracket the expensive stage with cheaper stages on different partition keys, so the expensive stage's hot keys don't cascade out to the rest of the pipeline.

Generalisation: bracket the expensive operator

The general shape:

  • The expensive operator is N stages in (most often N=2 in a three-stage pipeline; could be 3 in a five-stage one).
  • The upstream stages absorb the raw input rate and shape it for the expensive operator.
  • The downstream stages absorb the expensive operator's output and prepare it for storage.
  • Each stage is independently partitioned, so the expensive operator's hot keys don't dictate the whole pipeline's partitioning scheme.
  • Backpressure between stages bounds the buffering / queue depth.

Sibling patterns

Seen in

Last updated · 542 distilled / 1,571 read