Skip to content

CONCEPT Cited by 1 source

Kernel state capacity limit

Kernel state capacity limit is the observation that kernel subsystems that hold per-object state have a practical ceiling below which performance is fine and above which things degrade sharply — often with a kernel-panic risk, not just a slowdown.

User-space stores (an RDBMS, SQLite, a file on disk) do not share this property: they scale essentially to disk. The kernel is different because it:

  • Holds state in non-pageable memory, with different allocation and freeing costs than user-space.
  • Often exposes per-state-operation codepaths that are O(N) in some config step (enumerate-all-peers, reload-on-boot, namespace-teardown).
  • Runs in a context where bugs-under-pressure surface as panics, not application-level errors.

Canonical instance — WireGuard peers on Linux

Fly.io's 2024-03-12 post is an explicit narration of this limit. Stale WireGuard peers accumulated in the Linux kernel on their gateways to the low hundreds of thousands per host (chart topline just under 550,000). Resulting failure modes:

  • "The high stale peer count made kernel WireGuard operations very slow — especially loading all the peers back into the kernel after a gateway server reboot..."
  • "...as well as some kernel panics."

(Source: sources/2024-03-12-flyio-jit-wireguard-peers)

The design response is to keep the kernel state-hot-set small — install peers only on demand, evict aggressively. The kernel stays below its capacity wall; everything works. The user-space authoritative store (SQLite) is the "this isn't big data" tier that happily holds every peer ever used.

Framed in Fly's own language:

"Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn't 'big data'. ... you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily. What you can't do is store them all in the Linux kernel." (Source: sources/2024-03-12-flyio-jit-wireguard-peers)

Other kernel state with similar shape

The Fly.io post names only WireGuard, but the shape recurs (uncited, for wiki reference):

  • Conntrack tables. Linux netfilter connection tracking becomes slow and unreliable above tens of millions of entries on commodity kernels; nf_conntrack_count exceeding nf_conntrack_max produces drop-on-insert.
  • Routing tables. Kernel routing state at FIB-large scale is an ongoing subject of kernel work (FIB trie, FIB6 trie).
  • eBPF maps. Per-map entry count is a sized-at-load limit; runtime behaviour above the cap depends on the map type.
  • Network namespaces / cgroups. Per-namespace-teardown costs are O(linked-state); fleets that churn millions of namespaces hit wall-time ceilings on node shutdown.

Design response menu

  1. JIT install. Keep kernel-resident only what's active. Aggressive eviction, pull-from-userspace on demand. Canonical instance: concepts/jit-peer-provisioning.
  2. Offload to userspace. Move the data plane itself into user-space (DPDK, netmap, user-space TCP/IP stacks). Bypasses the kernel ceiling but trades for a different set of engineering challenges.
  3. Shard across hosts. Accept the per-host kernel wall, grow horizontally. Trade is control-plane routing complexity.
  4. Upstream fix. Work with kernel maintainers on the specific slow-path. Expensive, slow, sometimes only viable long-term option.

Seen in

Last updated · 200 distilled / 1,178 read