SYSTEM Cited by 1 source
Cloudflare Gen12 server¶
Definition¶
The Gen12 generation of Cloudflare core servers — bare-metal hardware running Cloudflare's centralised infrastructure (control plane, billing, analytics), distinct from the globally distributed edge POPs that handle user traffic. Approximately 2,000 units as of the 2026-06-01 boot-time post.
Originally introduced via the "Gen 12 servers" hardware- refresh post (linked from the boot-time post: blog.cloudflare.com/gen-12-servers/).
Role¶
"Cloudflare's core is the centralized data centers that run our control plane, billing, and analytics — distinct from the globally distributed edge that handles user traffic. Core servers are bare metal, and when issues happen during reboot, the consequences can cascade fast."
(Source: Cloudflare 2026-06-01)
The fleet's bare-metal nature plus the centralised role (control-plane / billing / analytics) means boot-time regressions cascade fast: maintenance windows balloon, firmware-upgrade rollouts compound per-boot waste into hours, and new capacity sits idle waiting for the "timeout gauntlet" to clear.
Boot stack¶
- UEFI firmware initialisation + POST + pre-boot environment.
- PXE pre-boot stage where boot interfaces are probed.
- iPXE as the open-source bootloader doing
the actual OS image fetch over HTTP/HTTPS, with
CfHIIConfig_Apptool integration for fleet automation. - Some hardware also supports UEFI HTTPS boot natively (motherboard firmware downloads OS files securely without iPXE chainloading).
Seen in¶
2026-06-01 — Boot-time regression: 4 hours → 3 minutes¶
Source: sources/2026-06-01-cloudflare-how-we-reduced-core-unit-boot-time-from-hours-to-minutes.
After a routine firmware update, Gen12 core servers began taking "four hours to come back online, rather than just minutes as they did before" — "new nodes faced the full timeout gauntlet on their very first boot. Maintenance windows ballooned. Engineering teams had to babysit upgrades that should have run unattended." Affected the entire Gen12 fleet — nearly 2,000 units.
Root cause: linear-search timeout amplification across an undeclared boot-interface list. The server was attempting (in order) IPv4 HTTPS → IPv4 iPXE → IPv6 HTTPS → IPv6 iPXE; each failed attempt burned ~5 minutes. With four attempts before the actually- working IPv6 HTTPS interface, every boot wasted ~20 minutes; firmware upgrades requiring multiple sequential reboots compounded that into ~4 hours.
Fix: declare
the boot interface order upfront in the pre-boot PXE stage
per hardware/use-case, with
state validation auto-reapply for firmware-upgrade
persistence loss, and a
wildcard match in CfHIIConfig_App for NIC vendor string
drift. Result: firmware-upgrade automation 4 hours → 3
minutes; subsequent single-boot ~20 minutes → < 1 minute.
Operational architecture¶
| Property | Gen12 core fleet |
|---|---|
| Tier | Core (centralised data centres) |
| Distinct from | Globally distributed edge POPs |
| Workload | Control plane / billing / analytics |
| Form factor | Bare metal |
| Scale | ~2,000 units |
| Firmware standard | systems/uefi |
| Network boot | systems/ipxe + UEFI HTTPS |
| Fleet automation tool | CfHIIConfig_App |
| Owning team | Cloudflare OpenBMC team |
| Hardware introduction | Gen 12 servers post |
Why it sits at the centre of the post¶
Three properties of the Gen12 fleet make the boot-time regression high-stakes:
- Bare-metal: each unit is a single physical asset; a 4-hour reboot is unrecoverable wall-clock cost.
- Centralised role: this isn't the edge — it's billing and the control plane. Capacity sitting idle has direct business impact.
- Fleet size compounds the per-unit cost: ~2,000 units times ~4 hours = ~8,000 server-hours per fleet-wide upgrade wave; plus "every unexpected failure mid-upgrade meant restarting the entire cycle, and new capacity sat idle waiting for the timeout gauntlet to clear."
Related¶
- companies/cloudflare
- systems/uefi — firmware substrate
- systems/ipxe — bootloader substrate
- concepts/network-boot-interface — the abstraction the bug surfaced inside
- concepts/linear-search-timeout-amplification — the failure mode
- concepts/firmware-config-persistence-loss — the persistence-across-firmware-upgrade hazard
- patterns/declare-boot-interface-order-upfront — the fix
- patterns/state-validation-with-auto-reapply-and-reboot
- patterns/wildcard-config-match-for-vendor-string-drift
- patterns/hex-comparison-flag-for-ipxe-config-check
- Original hardware post: Gen 12 servers