Skip to content

CONCEPT Cited by 1 source

Kernel panic from scale

A kernel panic from scale is the production failure mode where kernel code paths that work correctly at small / moderate state sizes trigger a panic once per-subsystem state grows beyond an untested-in-production size. The code didn't change; the state distribution did.

Signature

  • Panic traces consistently point to the same subsystem.
  • Panic cadence correlates with state size (peer count, connection count, table size), not with load spikes.
  • Post-mortem reveals slow codepaths in that subsystem that are quadratic or that iterate data structures without bounds checks tuned for the production scale.
  • Smaller hosts don't hit it; fleet subset at the largest state sizes does.

Canonical wiki instance

Fly.io's WireGuard gateways ran into kernel panics as stale peer counts approached hundreds of thousands per host:

"The high stale peer count made kernel WireGuard operations very slow — especially loading all the peers back into the kernel after a gateway server reboot — as well as some kernel panics." (Source: sources/2024-03-12-flyio-jit-wireguard-peers)

The panics are a symptom of concepts/kernel-state-capacity-limit — the kernel holding more of a thing than anyone ever tested.

Design response is the same as for kernel-state-capacity limits

Keep the kernel state-hot-set small. See concepts/jit-peer-provisioning for the specific-to-WireGuard worked example — the JIT rewrite's secondary benefit (alongside the no-more-delivery-guarantee-problem on the push path) is that panics stopped because the kernel no longer held enough peers to hit the pathological codepaths.

Contrast

  • Kernel panic from a code bug — fixed by code change, unrelated to load.
  • Kernel panic from hardware — ECC, MCE, driver, not state-size correlated.
  • Kernel panic from scale — fixed by keeping the data distribution inside the range the code actually works for, usually by design changes in the user-space system that feeds the kernel.

Seen in

Last updated · 200 distilled / 1,178 read