Skip to content

PATTERN Cited by 1 source

Per-slot iptables in namespace

Shape: when a multi-tenant Linux host accumulates O(tenants × rules-per-tenant) iptables rules in the root network namespace, move the per-tenant rules into each tenant's own network namespace. Root-namespace rule count drops to a static, slot-agnostic set; per-tenant rule traversal cost becomes O(1) in tenant count.

Pre-fix pathology: linear per-packet traversal cost

iptables evaluates rules in sequence per packet. When a rule set has:

  • R rules per tenant (routing, NAT, filtering — typically ~30 for a Lambda-sized isolation surface)
  • N tenants
  • G global fixed rules

→ the root namespace holds R × N + G rules. A packet destined for slot k walks through the rules for slots 1..k before reaching its own.

At R = 30, N = 4,000, G ≈ 144, the root namespace holds >125,000 rules. A packet for slot 4,000 walks through ~120,000 rules before matching its own. Measured cost at AWS Lambda: up to ~1 ms of connection setup latency from rule traversal alone. (Source: sources/2026-04-22-allthingsdistributed-invisible-engineering-behind-lambdas-network.)

Werner Vogels' framing: "This wasn't accumulated cruft or a discipline issue, but a density problem."

The fix

Move the R per-slot rules into each slot's own network namespace, leaving only the G global rules in the root namespace. For each slot:

  1. Slot's own namespace holds R slot-specific rules (only traversed by packets already routed into that namespace).
  2. Root namespace holds G slot-agnostic rules (traversed once per packet at root boundary).

Total rules on the host: still R × N + G, but no single packet traverses more than R + G — a constant, not scaling with N.

Lambda's disclosed result: root namespace went from 125,000+ rules to 144 static, slot-agnostic rules; the performance skew between slots disappeared. Every packet now traverses the same 144 rules regardless of slot assignment.

Why Linux lets this work

  • iptables rules are per-network-namespace, not global to the host kernel. Each namespace has its own netfilter tables.
  • Routing into a namespace (via veth pair, tap interface, etc.) causes the per-namespace tables to evaluate, not the root-namespace tables.
  • Namespace creation cost is paid at boot time anyway for density reasons, so the per-namespace rule installation piggybacks on existing setup.

Prerequisites

  1. Per-tenant network namespaces already exist (Lambda's micro-VM isolation requirement gives this for free).
  2. The per-tenant rules are meaningful only in the tenant's own namespace — if a rule must fire before namespace routing decides the packet's destination, it must stay in root.
  3. The global rule set is genuinely slot-agnostic — if global rules reference slot-specific data, the split doesn't simplify them.

Generalizes beyond iptables

The core insight is per-slot policy belongs in per-slot state containers, not in a shared linear-traversal structure. The same shape applies to:

  • Per-tenant nftables / BPF programs — attach the program to the tenant's veth or namespace, not to a shared egress hook.
  • Per-tenant cgroup hierarchies — one cgroup per tenant rather than a single flat hierarchy with tenant-discriminated rules.
  • Per-tenant route tables — move the routes into the tenant-specific route table (policy routing) instead of a growing main table.

Seen in

Last updated · 319 distilled / 1,201 read