Skip to content

SYSTEM Cited by 2 sources

WireGuard

WireGuard is a modern VPN protocol + Linux kernel subsystem designed by Jason Donenfeld. Its distinguishing properties for systems design:

  • Point-to-point, not client-server. "It's a pure point-to-point protocol; peers connect to each other when they have traffic to send. The first peer to connect is called the initiator, and the peer it connects to is the responder." (Source: sources/2024-03-12-flyio-jit-wireguard-peers) The protocol is symmetric — either end can be initiator. Fly.io exploits this in their role-inversion install trick.
  • Simple peer model. A peer config is a public key + an address. No user accounts, no certificates, no DH groups to negotiate. The configuration primitive is minimal.
  • Built on the Noise Protocol Framework (Trevor Perrin). Handshakes are identity-hiding — the initiator's static public key is encrypted by the handshake, not transmitted in the clear, so mere wire capture does not reveal who is connecting. (Consequence: Fly.io's JIT gateway must run ~200 lines of Noise crypto to identify an incoming connection. (Source: sources/2024-03-12-flyio-jit-wireguard-peers)).
  • Linux kernel module. Merged into Linux 5.6 (2020). User-space configures it via Netlink (the reference Go library is wgctrl-go).
  • UDP transport by default, port 51820 (protocol default). Fly.io also runs WireGuard-over-WebSockets for customers on networks that can't talk end-to-end UDP. (Source: sources/2024-03-12-flyio-jit-wireguard-peers)
  • Handshake-initiation packets are trivially identifiable. "The packet type is recorded in a single plaintext byte. So this simple BPF filter catches all the incoming connections: udp and dst port 51820 and udp[8] = 1." (Source: sources/2024-03-12-flyio-jit-wireguard-peers)

Capacity wall at scale: the kernel

"Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn't 'big data'. The problem we have at Fly.io is that our gateways don't have serious n-tier RDBMSs. They're small. Scrappy. They live off the land. Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily. What you can't do is store them all in the Linux kernel." (Source: sources/2024-03-12-flyio-jit-wireguard-peers)

The Linux kernel is the capacity wall on peer count. At Fly.io gateway scale the count reached the low hundreds of thousands per host (chart topline ~550k), at which point:

  • Kernel WireGuard operations (especially reload-on-reboot) became pathologically slow.
  • The kernel panicked in production.

The fix is JIT peer provisioning — keep peers in SQLite, materialise into the kernel only when an actual handshake arrives, evict aggressively.

The kernel's WireGuard config interface is Netlink (per the JIT peer provisioning discussion). It exposes RPCs for installing / removing peers and reading interface state (including the private key to a privileged process), but no subscription API for "handshake-initiation-received" events. Tools that need to react on connection attempts must manufacture the event themselves — e.g. via a BPF filter on the data plane.

Uses at Fly.io

Seen in

Last updated · 200 distilled / 1,178 read