Skip to content

SYSTEM Cited by 1 source

Cloudflare Gen12 server

Definition

The Gen12 generation of Cloudflare core servers — bare-metal hardware running Cloudflare's centralised infrastructure (control plane, billing, analytics), distinct from the globally distributed edge POPs that handle user traffic. Approximately 2,000 units as of the 2026-06-01 boot-time post.

Originally introduced via the "Gen 12 servers" hardware- refresh post (linked from the boot-time post: blog.cloudflare.com/gen-12-servers/).

Role

"Cloudflare's core is the centralized data centers that run our control plane, billing, and analytics — distinct from the globally distributed edge that handles user traffic. Core servers are bare metal, and when issues happen during reboot, the consequences can cascade fast."

(Source: Cloudflare 2026-06-01)

The fleet's bare-metal nature plus the centralised role (control-plane / billing / analytics) means boot-time regressions cascade fast: maintenance windows balloon, firmware-upgrade rollouts compound per-boot waste into hours, and new capacity sits idle waiting for the "timeout gauntlet" to clear.

Boot stack

  • UEFI firmware initialisation + POST + pre-boot environment.
  • PXE pre-boot stage where boot interfaces are probed.
  • iPXE as the open-source bootloader doing the actual OS image fetch over HTTP/HTTPS, with CfHIIConfig_App tool integration for fleet automation.
  • Some hardware also supports UEFI HTTPS boot natively (motherboard firmware downloads OS files securely without iPXE chainloading).

Seen in

2026-06-01 — Boot-time regression: 4 hours → 3 minutes

Source: sources/2026-06-01-cloudflare-how-we-reduced-core-unit-boot-time-from-hours-to-minutes.

After a routine firmware update, Gen12 core servers began taking "four hours to come back online, rather than just minutes as they did before""new nodes faced the full timeout gauntlet on their very first boot. Maintenance windows ballooned. Engineering teams had to babysit upgrades that should have run unattended." Affected the entire Gen12 fleet — nearly 2,000 units.

Root cause: linear-search timeout amplification across an undeclared boot-interface list. The server was attempting (in order) IPv4 HTTPS → IPv4 iPXE → IPv6 HTTPS → IPv6 iPXE; each failed attempt burned ~5 minutes. With four attempts before the actually- working IPv6 HTTPS interface, every boot wasted ~20 minutes; firmware upgrades requiring multiple sequential reboots compounded that into ~4 hours.

Fix: declare the boot interface order upfront in the pre-boot PXE stage per hardware/use-case, with state validation auto-reapply for firmware-upgrade persistence loss, and a wildcard match in CfHIIConfig_App for NIC vendor string drift. Result: firmware-upgrade automation 4 hours → 3 minutes; subsequent single-boot ~20 minutes → < 1 minute.

Operational architecture

Property Gen12 core fleet
Tier Core (centralised data centres)
Distinct from Globally distributed edge POPs
Workload Control plane / billing / analytics
Form factor Bare metal
Scale ~2,000 units
Firmware standard systems/uefi
Network boot systems/ipxe + UEFI HTTPS
Fleet automation tool CfHIIConfig_App
Owning team Cloudflare OpenBMC team
Hardware introduction Gen 12 servers post

Why it sits at the centre of the post

Three properties of the Gen12 fleet make the boot-time regression high-stakes:

  1. Bare-metal: each unit is a single physical asset; a 4-hour reboot is unrecoverable wall-clock cost.
  2. Centralised role: this isn't the edge — it's billing and the control plane. Capacity sitting idle has direct business impact.
  3. Fleet size compounds the per-unit cost: ~2,000 units times ~4 hours = ~8,000 server-hours per fleet-wide upgrade wave; plus "every unexpected failure mid-upgrade meant restarting the entire cycle, and new capacity sat idle waiting for the timeout gauntlet to clear."
Last updated · 542 distilled / 1,571 read