Skip to content

SYSTEM Cited by 1 source

Netflix FlowCollector

FlowCollector is Netflix's regional backend service that consumes TCP flow records from the FlowExporter sidecar fleet, attributes the remote IP of each flow to a Netflix workload identity, and forwards attributed flows downstream to Data Mesh for stream + batch processing. It processes ~5 million flows per second for the entire Netflix fleet on 30 c7i.2xlarge instances with no persistent storage — in-memory state is rebuilt from incoming heartbeats.

Attribution architecture

FlowCollector's load-bearing contract is: for every flow, produce either a correct (local_workload, remote_workload) attribution or mark the flow unattributed"a small percentage unattributed is acceptable, any misattribution is unacceptable." (patterns/accept-unattributed-flows)

Inputs already carry the resolved local_workload_id from FlowExporter (see systems/netflix-flowexporter for the in-kernel local-identity resolution). FlowCollector's job is the remote side.

In-memory time-range map

Each node maintains a hashmap from IP → sorted list of non-overlapping (workload_id, t_start, t_end) tuples. In Go:

type IPAddressTracker struct {
    ipToTimeRanges map[netip.Addr]timeRanges
}
type timeRanges []timeRange
type timeRange struct {
    workloadID string
    start      time.Time
    end        time.Time
}

Every incoming flow (local_ip, local_workload, t_start, t_end) extends the entry for local_ip. Time ranges are ascending-sorted and non-overlapping (an IP cannot belong to two workloads at once). This is the canonical concepts/heartbeat-based-ownership data structure.

Remote attribution by time-range lookup

On a flow arrival, FlowCollector looks up the remote IP in the map; the lookup returns a list of time ranges. It selects the range containing the flow's t_start timestamp and returns the associated workload identity. If t_start falls outside any range, FlowCollector retries after a delay and eventually gives up — delivering the flow unattributed rather than misattributed.

Timestamp reliability depends on Amazon Time Sync: Netflix's fleet is sub-millisecond clock-synced, so wall-clock time ranges are reliable attribution keys.

Cross-node Kafka broadcast

Each FlowCollector node only sees flows routed to it by the load balancer, but every node must attribute arbitrary remote IPs. To share learned time ranges, FlowCollector uses a Kafka broadcast: every node publishes learned ranges to a topic consumed by all other nodes. "Although more efficient broadcasting implementations exist, the Kafka-based approach is simple and has worked well for us."

1-minute disk buffer for remote attribution

FlowCollector cannot attribute a remote IP until the latest time ranges from the corresponding remote FlowExporter's next batch have arrived. Because FlowExporter reports every minute, FlowCollector buffers incoming flows on disk for 1 minute before running remote attribution. This replaces the prior 15-minute event-based holdback.

Regional partitioning + cross-regional forwarding

Netflix runs a FlowCollector cluster per major AWS region. FlowExporters send flows to their regional cluster. Broadcasting is limited to same-region peers — cross-region broadcast would be wasteful given cross-regional traffic is ~1% of flows.

Cross-regional flows (where the remote IP is in a different region) are forwarded to the peer region's FlowCollector. Region resolution uses a CIDR trie built from all Netflix VPC CIDRs — O(IP-length) lookup, far cheaper than global broadcast. (patterns/regional-forwarding-on-cidr-trie)

ELB + non-workload fallback via Sonar

Not all remote IPs belong to Netflix workloads — AWS ELB IPs cannot be heartbeat-attributed because FlowExporter can't run on an ELB. For these, FlowCollector falls back to the Sonar discrete-event stream. ELB reassignment is rare enough that Sonar's delay/ordering caveats produce tolerable accuracy.

Capacity + cost

  • 30 c7i.2xlarge instances total across regions.
  • 5M flows/sec aggregate.
  • No persistent storage. In-memory state rebuilt from incoming heartbeats + Kafka broadcast on cold start within minutes — a direct payoff of the heartbeat design.

Validation against Zuul

Two-week validation: compare FlowCollector-reported Zuul dependencies against Zuul's routing configuration (ground truth). Before the redesign, ~40% of Zuul's reported dependencies were misattributed. After the redesign: zero misattribution over the two-week window.

Seen in

Last updated · 319 distilled / 1,201 read