CONCEPT Cited by 1 source
RTNL lock contention¶
The Routing Netlink (RTNL) lock is Linux's single-writer mutex for network-configuration modifications (creating tap/veth devices, moving devices between namespaces, adding routes, attaching programs to interfaces). It guarantees that only one thread at a time can mutate the kernel's network-state tables.
Under normal desktop/server workloads the lock is invisible. Under high-concurrency network-setup, it becomes a serialization bottleneck.
The pathology¶
When N operations that each need the RTNL lock are issued in parallel, they queue behind each other. Throughput is 1/op-latency regardless of how many parallel callers there are. What would be O(1) per network slot in isolation becomes O(N) under parallel provisioning.
AWS Lambda's disclosure¶
When the Lambda networking team attempted to create thousands of network devices + namespaces in parallel during worker boot, the RTNL lock turned a "should take seconds" into "stretched to minutes." (Source: sources/2026-04-22-allthingsdistributed-invisible-engineering-behind-lambdas-network.)
Werner's metaphor: "a bit like when a car breaks down on a bridge in Amsterdam (a city that is not designed for cars). First the car behind it gets stuck, then the car behind that one, then a tram, and on-and-on until the entire city is gridlocked."
Fix shapes¶
You cannot remove the RTNL lock — it's a correctness primitive. You can reduce the number of lock acquisitions per network slot and reorder operations so lock holders run as briefly as possible:
- Pool namespaces first — create the namespaces before the devices that will live inside them.
- Create veth pairs inside the target namespace — avoid the "create in root, then move" pattern that costs two RTNL-held operations per device.
- Batch eBPF program attachments — attach programs to all veth devices in a single operation instead of one lock-reacquire per device.
Lambda applied all three. Queuing disappeared. This is a concrete instance of patterns/pre-create-all-network-slots-at-boot and a specialization of the broader warm-pool / zero-work create path family.
Related kernel-state limits¶
The RTNL lock is one of several single-writer / shared-state bottlenecks that hit at high density. See concepts/kernel-state-capacity-limit for the broader category — conntrack table size, socket-hash collisions, routing-table depth are analogous resources with their own per-kernel-subsystem locking models.
Seen in¶
- sources/2026-04-22-allthingsdistributed-invisible-engineering-behind-lambdas-network — canonical wiki disclosure of the RTNL-lock-at-Lambda-density pathology; the three-step fix (namespace-first pooling, in-namespace veth creation, batched eBPF attach) is named explicitly.