SYSTEM Cited by 1 source
Fly gateway¶
Fly gateways are a fleet of dozens of servers around the
world whose sole job is to accept incoming WireGuard connections
from flyctl (and other external clients)
and connect them to the appropriate private networks inside
Fly.io.
"We operate dozens of 'gateway' servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks." (Source: sources/2024-03-12-flyio-jit-wireguard-peers)
Gateways are regional (ord for Chicago, and so on). The stack on
each gateway:
- Linux kernel with WireGuard enabled + Netlink config surface.
wggwd— the Fly-authored gateway-side WireGuard manager daemon.- SQLite — local peer-config store and, in the JIT design, the rate-limit cache for API lookups.
- WireSockets daemon — terminates the WireGuard-over-WebSockets transport Fly.io defaults customers to, so packets arrive at the gateway regardless of the customer's end-to-end UDP path.
Role under JIT WireGuard (2024-03-12)¶
Post-JIT-rewrite, the gateway's role inverts from receiver of peer pushes to pull-on-first-packet owner:
- Sniff handshake-initiation packets on the data plane via a
BPF filter (
udp and dst port 51820 and udp[8] = 1) — or equivalent hook inside the WireSockets daemon for WebSocket-delivered traffic. - Decrypt the initiator's static public key by running the first leg of the Noise handshake (requires the interface private key, fetched from Netlink — privileged process only, ~200 lines of code).
- Consult a rate-limited SQLite cache on the pubkey; on miss, make an internal HTTP API request to the Fly control plane for the peer config.
- Install the peer via Netlink, in the initiator role, so
the kernel originates a WireGuard handshake back to
flyctlimmediately (canonical role inversion). - A cron job aggressively removes stale peers from the kernel — cheap because the next connection will re-pull the peer anyway.
Historical role (pre-JIT)¶
"Until a few weeks ago, our gateways ran on a pretty simple
system." — push-based: the Fly GraphQL API forwarded every new
flyctl-generated peer config to the appropriate gateway over
NATS; wggwd installed it and never cleaned it
up. Two failure modes drove the rewrite: NATS dropped messages
(so install races the GraphQL reply) and stale peers accumulated
to the low hundreds of thousands per host (slow kernel, kernel
panics). (Source:
sources/2024-03-12-flyio-jit-wireguard-peers)
Seen in¶
- sources/2024-03-12-flyio-jit-wireguard-peers — canonical wiki instance; full before/after architecture + JIT rewrite.
Related¶
- systems/wireguard — the protocol it fronts.
- systems/wggwd — the daemon that runs on it.
- systems/fly-flyctl — the external client it accepts connections from.
- systems/fly-graphql-api — the control plane it pulls peer configs from.
- systems/linux-netlink — the kernel config surface it drives.
- systems/sqlite — the local store + rate-limit cache.
- systems/nats — deprecated pre-JIT push transport.
- concepts/jit-peer-provisioning — the architectural move.
- patterns/jit-provisioning-on-first-packet — the reusable pattern.
- patterns/bpf-filter-for-api-event-source — the event source.
- patterns/state-eviction-cron — the cleanup primitive.
- companies/flyio.