Skip to content

PATTERN Cited by 1 source

JIT provisioning on first packet

Intent

Install per-peer state in a kernel / data-plane primitive lazily, on the arrival of the first packet from that peer — not proactively, not pushed. Evict stale state aggressively; the next packet re-installs.

When to use

  • The primitive has a state-capacity wall (see concepts/kernel-state-capacity-limit) — typically a Linux kernel subsystem — and your legitimate-peer population is much larger than the subset active at any moment.
  • Your push-based provisioning has delivery-guarantee problems (dropped RPCs, network partitions between control plane and data plane) that force either expensive reconciliation or visible failures.
  • The first-packet arrival is reliably detectable via a native event API or via packet sniffing.
  • The control plane can install peers fast enough relative to client-side retry cadence that the first-successful- handshake latency is acceptable. (A further latency optimisation — patterns/initiator-responder-role-inversion — exists when the protocol allows.)

Shape

+-------------+       handshake pkt       +---------------+
|  Peer (ff)  |  ------------------------> |   Gateway     |
|             |                             |               |
|             | <------ (delay) ----------  |  1. sniff pkt |
|             |                             |  2. identify  |
|             |                             |  3. pull cfg  |
|             |                             |  4. install   |
|             | <- server-initiated hshk -  |     (init.    |
|             |                             |      role)    |
+-------------+                             +---------------+

           +---------------------+
           |   Control plane     |
           |  (pubkey → config)  |
           +---------------------+
  1. Catch first packet. BPF filter, native event API, daemon hook — whichever fits the substrate.
  2. Identify the peer. For cleartext-identifiable protocols this is trivial; for identity-hiding handshakes it requires running the handshake crypto's first leg.
  3. Rate-limit-cache the identification → config lookup. Handshake retries are common; caching with a sliding window prevents retry storms from becoming API storms. See concepts/rate-limited-cache.
  4. Install in the kernel via the primitive's native config API (Netlink, sockopts, whatever).
  5. Evict on a cron. Cheap because step 1 re-provisions. See patterns/state-eviction-cron.

Worked example — Fly.io WireGuard gateways (2024-03-12)

Canonical wiki instance. Fly.io's gateways:

  • Sniff handshake-initiation packets via udp and dst port 51820 and udp[8] = 1 (BPF) or hook inside the WireSockets daemon (WebSocket transport).
  • Identify initiator by running the first leg of Noise with the interface private key (~200 lines; private key read via Netlink from a privileged process). WireGuard uses Noiseidentity-hiding — so this step is mandatory.
  • Lookup + rate-limit-cache in SQLite.
  • Install the peer via Netlink, in the initiator role (so the kernel sends the next handshake packet back to the peer immediately — see patterns/initiator-responder-role-inversion).
  • Evict stale peers on cron.

Production result: stale peer count per gateway went from low hundreds of thousands (topline ~550k on the Grafana chart) to "rounds to none" post-switchover. Reboots stopped being slow; kernel panics stopped.

(Source: sources/2024-03-12-flyio-jit-wireguard-peers)

Variants

  • Native first-packet event. If the primitive exposes one, use it. Drops the BPF-sniff step.
  • Plaintext-ID. Non-identity-hiding protocols skip the handshake-crypto step.
  • Pre-warm on adjacency. If the client's region is known cheaply (anycast DNS, client-hinted location), the gateway can prefetch configs for likely upcoming peers. Not the Fly.io design — they pull strictly on first packet — but a plausible extension.

Failure modes

  • No first-packet observability. If the event is invisible (fully-encrypted early handshake with no type bit, or no gateway-observable transport), the pattern doesn't apply.
  • Control-plane latency dominates retry cadence. If the install path is too slow, every new peer takes multiple retries to come up. Solved partially by role inversion (the gateway sends handshake back at install-time); fully by making the control-plane fetch fast.
  • Eviction thrash on oscillating active sets. A cron that evicts too aggressively relative to real-world idle-reconnect cadence re-installs peers needlessly. Tune eviction window vs. expected reconnect cadence.
  • Identity-step crypto cost. Per-handshake ~200 lines of crypto isn't free at very high handshake rates. Scale by rate-limit-caching and by sharding handshake-identify across cores.

Broader applicability

Beyond WireGuard:

  • On-demand IPsec SA install. Instead of pushing every SA, install on first IKE initiation.
  • On-demand iptables / nftables entries. Install rule for a client only when its first packet arrives.
  • On-demand eBPF map entries for per-flow state.
  • On-demand TLS session ticket acceptance. Install session keys in memory only when a resumption attempt is seen.

In all cases the pattern is the same: kernel / data-plane state is a working-set, not a universe of configs. Use the first packet as the event that defines what's currently in the working set.

Seen in

Last updated · 200 distilled / 1,178 read