SYSTEM Cited by 1 source

Netflix FlowCollector¶

FlowCollector is Netflix's regional backend service that consumes TCP flow records from the FlowExporter sidecar fleet, attributes the remote IP of each flow to a Netflix workload identity, and forwards attributed flows downstream to Data Mesh for stream + batch processing. It processes ~5 million flows per second for the entire Netflix fleet on 30 c7i.2xlarge instances with no persistent storage — in-memory state is rebuilt from incoming heartbeats.

Attribution architecture¶

FlowCollector's load-bearing contract is: for every flow, produce either a correct (local_workload, remote_workload) attribution or mark the flow unattributed — "a small percentage unattributed is acceptable, any misattribution is unacceptable." (patterns/accept-unattributed-flows)

Inputs already carry the resolved local_workload_id from FlowExporter (see systems/netflix-flowexporter for the in-kernel local-identity resolution). FlowCollector's job is the remote side.

In-memory time-range map¶

Each node maintains a hashmap from IP → sorted list of non-overlapping (workload_id, t_start, t_end) tuples. In Go:

type IPAddressTracker struct {
    ipToTimeRanges map[netip.Addr]timeRanges
}
type timeRanges []timeRange
type timeRange struct {
    workloadID string
    start      time.Time
    end        time.Time
}

Every incoming flow (local_ip, local_workload, t_start, t_end) extends the entry for local_ip. Time ranges are ascending-sorted and non-overlapping (an IP cannot belong to two workloads at once). This is the canonical concepts/heartbeat-based-ownership data structure.

Remote attribution by time-range lookup¶

On a flow arrival, FlowCollector looks up the remote IP in the map; the lookup returns a list of time ranges. It selects the range containing the flow's t_start timestamp and returns the associated workload identity. If t_start falls outside any range, FlowCollector retries after a delay and eventually gives up — delivering the flow unattributed rather than misattributed.

Timestamp reliability depends on Amazon Time Sync: Netflix's fleet is sub-millisecond clock-synced, so wall-clock time ranges are reliable attribution keys.

Cross-node Kafka broadcast¶

Each FlowCollector node only sees flows routed to it by the load balancer, but every node must attribute arbitrary remote IPs. To share learned time ranges, FlowCollector uses a Kafka broadcast: every node publishes learned ranges to a topic consumed by all other nodes. "Although more efficient broadcasting implementations exist, the Kafka-based approach is simple and has worked well for us."

1-minute disk buffer for remote attribution¶

FlowCollector cannot attribute a remote IP until the latest time ranges from the corresponding remote FlowExporter's next batch have arrived. Because FlowExporter reports every minute, FlowCollector buffers incoming flows on disk for 1 minute before running remote attribution. This replaces the prior 15-minute event-based holdback.

Regional partitioning + cross-regional forwarding¶

Netflix runs a FlowCollector cluster per major AWS region. FlowExporters send flows to their regional cluster. Broadcasting is limited to same-region peers — cross-region broadcast would be wasteful given cross-regional traffic is ~1% of flows.

Cross-regional flows (where the remote IP is in a different region) are forwarded to the peer region's FlowCollector. Region resolution uses a CIDR trie built from all Netflix VPC CIDRs — O(IP-length) lookup, far cheaper than global broadcast. (patterns/regional-forwarding-on-cidr-trie)

ELB + non-workload fallback via Sonar¶

Not all remote IPs belong to Netflix workloads — AWS ELB IPs cannot be heartbeat-attributed because FlowExporter can't run on an ELB. For these, FlowCollector falls back to the Sonar discrete-event stream. ELB reassignment is rare enough that Sonar's delay/ordering caveats produce tolerable accuracy.

Capacity + cost¶

30 c7i.2xlarge instances total across regions.
5M flows/sec aggregate.
No persistent storage. In-memory state rebuilt from incoming heartbeats + Kafka broadcast on cold start within minutes — a direct payoff of the heartbeat design.

Validation against Zuul¶

Two-week validation: compare FlowCollector-reported Zuul dependencies against Zuul's routing configuration (ground truth). Before the redesign, ~40% of Zuul's reported dependencies were misattributed. After the redesign: zero misattribution over the two-week window.

Seen in¶

sources/2025-04-08-netflix-how-netflix-accurately-attributes-ebpf-flow-logs — canonical ingest; names the in-memory time-range map, Kafka broadcast, 1-minute disk buffer, CIDR-trie cross-region forwarding, Sonar ELB fallback, 30-c7i.2xlarge + 5M-flows/sec + zero-storage operating point, and Zuul validation.

companies/netflix
systems/netflix-flowexporter — the sidecar producer
systems/netflix-sonar — legacy event-based path; retained for ELB attribution only
systems/netflix-data-mesh — downstream consumer
systems/kafka — broadcast substrate
concepts/discrete-event-vs-heartbeat-attribution · concepts/heartbeat-based-ownership · concepts/ip-attribution · concepts/amazon-time-sync-attribution · concepts/cross-regional-attribution-trie
patterns/heartbeat-derived-ip-ownership-map · patterns/kafka-broadcast-for-shared-state · patterns/regional-forwarding-on-cidr-trie · patterns/accept-unattributed-flows