Skip to content

CONCEPT Cited by 1 source

Double NAT

Double NAT is the configuration where a packet undergoes two stateful Network Address Translation stages on the same host before leaving it (or on the return path, in reverse). Each NAT stage consumes a kernel conntrack-table entry and costs a conntrack lookup per packet.

In Linux, each NAT stage is implemented via iptables + netfilter's conntrack subsystem, which maintains a per-flow state table.

AWS Lambda's pre-fix configuration

Lambda's original packet path for a micro-VM:

  1. Packet egresses the VM's network namespace → first NAT (inside the VM's namespace).
  2. Packet egresses the host's eth0 interface → second NAT.

At high densities (thousands of VMs on one host processing traffic simultaneously), the kernel's conntrack table is contended for every packet from every flow on every VM. This adds significant latency and CPU cost per packet. (Source: sources/2026-04-22-allthingsdistributed-invisible-engineering-behind-lambdas-network.)

Why double NAT emerges

Stateful NAT for tenant isolation usually picks up a second stage because:

  • The VM-namespace NAT translates between the tenant's private address range and the host-internal range.
  • The host's eth0 NAT translates between the host-internal range and the cloud-provider's outbound range.

Both are legitimate in isolation; together they double the conntrack pressure.

Why it hurts at density

  • Conntrack is a stateful table: every active flow consumes an entry; lookup cost grows with table fullness.
  • Lookups are serialized by lock-hashed buckets: under multi-tenant burst, buckets contend.
  • Both NAT stages share the same kernel resource on the same host — they don't spread across resources.

The fix: stateless NAT via eBPF

Lambda replaced stateful NAT with stateless packet mangling: an eBPF program rewrites packet headers based on predetermined mappings instead of tracking connection state. No conntrack entry is allocated; no per-flow state is maintained.

NAT setup latency dropped 100×.

The replacement is viable because the mappings are predetermined — Lambda knows, at slot-assignment time, which internal address corresponds to which VM. This is the key difference from general NAT (e.g., a consumer ISP's masquerade), where the mapping must be dynamic because the inside-address set is unbounded.

Seen in

Last updated · 319 distilled / 1,201 read