Skip to content

SYSTEM Cited by 1 source

GitHub Pages

GitHub Pages is GitHub's static-site hosting product (*.github.io + custom domains). One of GitHub's earliest platform offerings, born simple — up to early 2015 the entire service ran on a single active/standby pair of machines with user data on 8 DRBD-backed partitions and a cron-regenerated nginx map file that translated hostnames to on-disk paths. Canonical wiki anchor for the simple components, don't prematurely generalise design philosophy ("avoiding prematurely solving problems that aren't yet problems").

2015 rearchitecture (canonical source)

Scale forcing function: "thousands of requests per second to over half a million sites" + storage ceiling of a single pair + 30-minute publish latency + cold-restart cost of loading a giant pre-generated nginx map. The rewrite splits the stack into two tiers:

Frontend routing tier — Dell C5220s running nginx with an ngx_lua script in access_by_lua_file. Per request, the Lua script queries a MySQL read replica for which backend fileserver pair hosts the requested site, sets $gh_pages_host + $gh_pages_path, and proxy_passes to the backend. Canonical wiki instance of patterns/db-routed-request-proxy.

Fileserver tier — pairs of Dell R720s running active/standby with DRBD-synchronous replication across 8 partitions; nginx document root = X-GitHub-Pages-Root header set by the router. Each pair is shaped identically to the pre-2015 single pair, so the rewrite reused "large parts of our configuration and tooling". Scaling is horizontal by adding pairs — patterns/horizontally-scale-stateful-tier-via-pairs.

Availability-dependency trade-off

The rewrite introduces a first-order availability dependency on MySQL — per-request routing lookups run against the DB. Four mitigations disclosed in the post:

  1. Retry-on-error to a different read replica. The Lua router reconnects to an alternate replica on query failure.
  2. 30-second shared-memory cache of routing lookups on the pages-fe node. Reduces MySQL load and absorbs short blips. Canonical wiki instance of patterns/cached-lookup-with-short-ttl.
  3. Reads go to replicas, not the master. MySQL master failovers
  4. master-tier maintenance windows don't take Pages down — "existing Pages will remain online even during database maintenance windows where we have to take the rest of the site down".
  5. Fastly in front caches every 200 response. Even a total router outage leaves cached Pages sites reachable. Canonical wiki instance of patterns/cdn-in-front-for-availability-fallback.

Performance

Disclosed: < 3 ms of each request in Lua at the p98, "including time spent in external network calls", across millions of HTTP requests per hour. The p98 bound covers the MySQL lookup path — the load-bearing datum for DB-routed proxy viability under production traffic.

Operational wins of the rewrite

  • Instant publishing: new Pages sites go live as soon as the MySQL row is written; no 30-minute cron wait.
  • No cold-restart penalty: nginx doesn't load a giant map at startup.
  • Horizontal storage scaling: storage capacity grows by adding fileserver pairs, not by fattening a single machine.

Historical note

The source post is an adaptation of GitHub's 2015 writeup, republished on github.blog in 2025. The architecture described is 10 years old; GitHub Pages has almost certainly evolved. Ingested here as a canonical historical-architecture reference for DB-routed static hosting at scale.

Seen in

Last updated · 319 distilled / 1,201 read