PATTERN Cited by 1 source
Per-slot iptables in namespace¶
Shape: when a multi-tenant Linux host accumulates O(tenants × rules-per-tenant) iptables rules in the root network namespace, move the per-tenant rules into each tenant's own network namespace. Root-namespace rule count drops to a static, slot-agnostic set; per-tenant rule traversal cost becomes O(1) in tenant count.
Pre-fix pathology: linear per-packet traversal cost¶
iptables evaluates rules in sequence per packet. When a rule set has:
Rrules per tenant (routing, NAT, filtering — typically ~30 for a Lambda-sized isolation surface)NtenantsGglobal fixed rules
→ the root namespace holds R × N + G rules. A packet destined for
slot k walks through the rules for slots 1..k before reaching its
own.
At R = 30, N = 4,000, G ≈ 144, the root namespace holds
>125,000 rules. A packet for slot 4,000 walks through ~120,000
rules before matching its own. Measured cost at AWS Lambda: up to
~1 ms of connection setup latency from rule traversal alone.
(Source:
sources/2026-04-22-allthingsdistributed-invisible-engineering-behind-lambdas-network.)
Werner Vogels' framing: "This wasn't accumulated cruft or a discipline issue, but a density problem."
The fix¶
Move the R per-slot rules into each slot's own network namespace,
leaving only the G global rules in the root namespace. For each
slot:
- Slot's own namespace holds
Rslot-specific rules (only traversed by packets already routed into that namespace). - Root namespace holds
Gslot-agnostic rules (traversed once per packet at root boundary).
Total rules on the host: still R × N + G, but no single packet
traverses more than R + G — a constant, not scaling with N.
Lambda's disclosed result: root namespace went from 125,000+ rules to 144 static, slot-agnostic rules; the performance skew between slots disappeared. Every packet now traverses the same 144 rules regardless of slot assignment.
Why Linux lets this work¶
- iptables rules are per-network-namespace, not global to the host kernel. Each namespace has its own netfilter tables.
- Routing into a namespace (via veth pair, tap interface, etc.) causes the per-namespace tables to evaluate, not the root-namespace tables.
- Namespace creation cost is paid at boot time anyway for density reasons, so the per-namespace rule installation piggybacks on existing setup.
Prerequisites¶
- Per-tenant network namespaces already exist (Lambda's micro-VM isolation requirement gives this for free).
- The per-tenant rules are meaningful only in the tenant's own namespace — if a rule must fire before namespace routing decides the packet's destination, it must stay in root.
- The global rule set is genuinely slot-agnostic — if global rules reference slot-specific data, the split doesn't simplify them.
Generalizes beyond iptables¶
The core insight is per-slot policy belongs in per-slot state containers, not in a shared linear-traversal structure. The same shape applies to:
- Per-tenant nftables / BPF programs — attach the program to the tenant's veth or namespace, not to a shared egress hook.
- Per-tenant cgroup hierarchies — one cgroup per tenant rather than a single flat hierarchy with tenant-discriminated rules.
- Per-tenant route tables — move the routes into the tenant-specific route table (policy routing) instead of a growing main table.
Seen in¶
- sources/2026-04-22-allthingsdistributed-invisible-engineering-behind-lambdas-network — canonical wiki disclosure of the 125,000 → 144 root-rules compression at AWS Lambda.