SYSTEM Cited by 1 source

Cloudflare Gen12 server¶

Definition¶

The Gen12 generation of Cloudflare core servers — bare-metal hardware running Cloudflare's centralised infrastructure (control plane, billing, analytics), distinct from the globally distributed edge POPs that handle user traffic. Approximately 2,000 units as of the 2026-06-01 boot-time post.

Originally introduced via the "Gen 12 servers" hardware- refresh post (linked from the boot-time post: blog.cloudflare.com/gen-12-servers/).

Role¶

"Cloudflare's core is the centralized data centers that run our control plane, billing, and analytics — distinct from the globally distributed edge that handles user traffic. Core servers are bare metal, and when issues happen during reboot, the consequences can cascade fast."

(Source: Cloudflare 2026-06-01)

The fleet's bare-metal nature plus the centralised role (control-plane / billing / analytics) means boot-time regressions cascade fast: maintenance windows balloon, firmware-upgrade rollouts compound per-boot waste into hours, and new capacity sits idle waiting for the "timeout gauntlet" to clear.

Boot stack¶

UEFI firmware initialisation + POST + pre-boot environment.
PXE pre-boot stage where boot interfaces are probed.
iPXE as the open-source bootloader doing the actual OS image fetch over HTTP/HTTPS, with CfHIIConfig_App tool integration for fleet automation.
Some hardware also supports UEFI HTTPS boot natively (motherboard firmware downloads OS files securely without iPXE chainloading).

Seen in¶

2026-06-01 — Boot-time regression: 4 hours → 3 minutes¶

Source: sources/2026-06-01-cloudflare-how-we-reduced-core-unit-boot-time-from-hours-to-minutes.

After a routine firmware update, Gen12 core servers began taking "four hours to come back online, rather than just minutes as they did before" — "new nodes faced the full timeout gauntlet on their very first boot. Maintenance windows ballooned. Engineering teams had to babysit upgrades that should have run unattended." Affected the entire Gen12 fleet — nearly 2,000 units.

Root cause: linear-search timeout amplification across an undeclared boot-interface list. The server was attempting (in order) IPv4 HTTPS → IPv4 iPXE → IPv6 HTTPS → IPv6 iPXE; each failed attempt burned ~5 minutes. With four attempts before the actually- working IPv6 HTTPS interface, every boot wasted ~20 minutes; firmware upgrades requiring multiple sequential reboots compounded that into ~4 hours.

Fix: declare the boot interface order upfront in the pre-boot PXE stage per hardware/use-case, with state validation auto-reapply for firmware-upgrade persistence loss, and a wildcard match in CfHIIConfig_App for NIC vendor string drift. Result: firmware-upgrade automation 4 hours → 3 minutes; subsequent single-boot ~20 minutes → < 1 minute.

Operational architecture¶

Property	Gen12 core fleet
Tier	Core (centralised data centres)
Distinct from	Globally distributed edge POPs
Workload	Control plane / billing / analytics
Form factor	Bare metal
Scale	~2,000 units
Firmware standard	systems/uefi
Network boot	systems/ipxe + UEFI HTTPS
Fleet automation tool	`CfHIIConfig_App`
Owning team	Cloudflare OpenBMC team
Hardware introduction	Gen 12 servers post

Why it sits at the centre of the post¶

Three properties of the Gen12 fleet make the boot-time regression high-stakes:

Bare-metal: each unit is a single physical asset; a 4-hour reboot is unrecoverable wall-clock cost.
Centralised role: this isn't the edge — it's billing and the control plane. Capacity sitting idle has direct business impact.
Fleet size compounds the per-unit cost: ~2,000 units times ~4 hours = ~8,000 server-hours per fleet-wide upgrade wave; plus "every unexpected failure mid-upgrade meant restarting the entire cycle, and new capacity sat idle waiting for the timeout gauntlet to clear."

companies/cloudflare
systems/uefi — firmware substrate
systems/ipxe — bootloader substrate
concepts/network-boot-interface — the abstraction the bug surfaced inside
concepts/linear-search-timeout-amplification — the failure mode
concepts/firmware-config-persistence-loss — the persistence-across-firmware-upgrade hazard
patterns/declare-boot-interface-order-upfront — the fix
patterns/state-validation-with-auto-reapply-and-reboot
patterns/wildcard-config-match-for-vendor-string-drift
patterns/hex-comparison-flag-for-ipxe-config-check
Original hardware post: Gen 12 servers