Cloudflare — How we reduced core unit boot time from hours to minutes¶
Summary¶
Cloudflare engineering post (2026-06-01) by the OpenBMC team
narrating a fleet-wide regression in core-server boot time —
following a routine firmware update,
Gen12 core servers (the Gen12 generation, "nearly 2,000 units",
running Cloudflare's centralised control plane / billing /
analytics on bare metal) began taking four hours to come back
online instead of minutes, stretching one-day rollouts into
multi-day slogs and forcing engineers to babysit upgrades that
should have run unattended. Root cause: "an over-eager linear
search through every available network boot interface." During
the network boot stage of the
UEFI / iPXE sequence, the server
attempts each available boot interface in a fixed order — IPv4
HTTPS → IPv4 iPXE → IPv6 HTTPS — and waits for each to time out
before falling through to the next; each failed attempt burns
"roughly five minutes" and the IPv6 HTTPS interface that
actually works in this fleet sits at the end of the list, so
every boot wastes ~20 minutes of timeout and a firmware
upgrade (which requires multiple sequential reboots, one per
component) compounds those penalties into nearly four hours of
idle waiting per server. The fix is the
declare-boot-
interface-order-upfront pattern: rather than letting the server
discover the right interface, the boot automation declares it
upfront in the pre-boot PXE
stage for each hardware/use-case combination, eliminating the
linear-search
timeout amplification entirely. Three ancillary obstacles
required real engineering. (1) Legacy + persistence: older
UEFI versions don't support boot ordering at all, and "configuration
settings are often reset following a UEFI firmware upgrade" — so
the automation now runs a [[patterns/state-validation-with-auto-
reapply-and-reboot|state-validation step]] that detects drift
post-change, re-applies the config, and triggers a reboot. (2)
Vendor-locked immutable setting: an OEM Force Priority Httpv4
Httpv6 Pxev4 Pxev6 token blocked boot-order changes entirely,
plus the EFI_IFR_REF3 "Network Boot Interface" data structure
was lazy-loaded — "not
instantiated until it is explicitly accessed via a GUI callback"
so it was invisible to programmatic scans. Required a new BIOS
version from the vendor and a debug session, with vendors
enabling specific tokens within the fixed "Boot Order Module"
that forces discovery during the boot sequence without
requiring manual GUI interaction. (3) NIC vendor string
drift: different NIC vendors emit different boot-interface
strings (e.g. UEFI: HTTPS IPv4 Ethernet Network Adapter
XXX-XXX-Y for OCP 3.0 P1 vs UEFI: HTTPS IPv4 Network Adapter -
50:00:E6:8F:4F:32 P1), causing exact-match config to mismatch —
canonicalised at [[concepts/vendor-string-mismatch-on-fleet-
config|vendor-string mismatch]] and resolved with the
wildcard
match .*HTTP.*IPv4.*P1 feature added to Cloudflare's
CfHIIConfig_App tool. Cloudflare is "currently working with
[their] UEFI vendors to standardize the network interface
strings" (drop MAC + product details, keep protocol + transfer
type + port + slot index — read product details from the NIC's
embedded vital product data when needed) so wildcards can
eventually be retired. (4) iPXE reads variables as HEX, which
broke direct equality checks against expected string values, so
the team added a uefi-same-hex boolean flag that lets the
automation run a single set command instead of show followed
by set (
hex-comparison flag). Headline operational outcomes:
firmware-upgrade automation 4 hours → 3 minutes; subsequent
single boot ~20 minutes → less than a minute. The endpoint is
"a more dynamic system" where "a single BIOS firmware image
serves all SKUs, configuration updates deploy at scale through
our existing release pipeline, and the entire workflow operates
from iPXE."
Key takeaways¶
- The bug was not a firmware regression — it was the absence of a declared boot order. When the post-update reports came in (servers stuck pre-OS), the team's first hypothesis was that the firmware update introduced a hang. Serial console showed the opposite: POST completed normally, hardware initialisation was healthy, but "instead of quickly reaching the network boot stage and pulling down an OS image, the server sat waiting. And waiting." The symptom was a hang; the cause was a sequence of timeouts from probing interfaces that would never respond.
- Linear search through boot interfaces is the canonical timeout-amplification failure mode. Verbatim: "the system was attempting an IPv4 HTTPS network boot, timing out after several minutes, then trying IPv4 iPXE, timing out again, then repeating both — all before finally reaching the IPv6 HTTPS boot interface that would actually succeed. Every failed network boot attempt burned roughly five minutes waiting for a timeout response. With four attempts stacking up before the correct interface was reached, a single boot cycle wasted around twenty minutes."
- Firmware-upgrade automation compounds per-boot waste into per-upgrade hours. "For a routine reboot, that's painful. For firmware upgrade automation, which requires multiple sequential reboots, one per component, those twenty-minute penalties compounded into nearly four hours of idle waiting per server." The bug existed before the firmware update — the update merely shifted which interface succeeded, exposing the timeout-amplification pattern that was structurally present all along.
- Two structural constraints make boot-order declaration non-trivial: legacy UEFI versions and config persistence. "Boot ordering is not supported on older UEFI versions" — so any solution must coexist with un-orderable hardware. "Configuration settings are often reset following a UEFI firmware upgrade" — so the automation cannot assume its declared order survives the very upgrade it's trying to accelerate. Both lead to the same operational primitive: validate-then-reapply-then-reboot.
- State validation with auto-reapply is the durable solution to firmware config persistence. "To address these edge cases, we implemented a state validation step. The firmware automation now validates the configuration post-change: if it detects that settings have been modified, it re-applies the config and triggers a reboot." The first boot after an upgrade may take slightly longer (because of the validation
- reapply + reboot loop), "but this change drastically reduces the time required for all future start-ups from about 20 minutes to less than a minute per subsequent boot."
- Vendor coordination is on the critical path for fleet
automation. Two vendor obstacles blocked the fix until
coordination resolved them. First, the EFI_IFR_REF3
structure was lazy-loaded by design ("standard industry
practice to accelerate BIOS boot times") which made the
network-boot-interface invisible to programmatic scans
"because the structure hadn't been 'loaded' yet." Second,
an immutable OEM setting (
Force Priority Httpv4 Httpv6 Pxev4 Pxev6) prevented changing the boot order at all. Both required "a new BIOS version from our vendor and a debug session," with the OEM enabling tokens within the "Boot Order Module" that force discovery without GUI interaction. - NIC vendor string drift is a fleet-wide configuration
hazard, mitigated by wildcards as a stop-gap and string
standardisation as the durable fix. Verbatim examples
show the magnitude:
UEFI: HTTPS IPv4 Ethernet Network Adapter XXX-XXX-Y for OCP 3.0 P1vsUEFI: HTTPS IPv4 Network Adapter - 50:00:E6:8F:4F:32 P1— same boot interface, irreconcilably different strings. The wildcard workaround is.*HTTP.*IPv4.*P1(added to Cloudflare'sCfHIIConfig_Apptool); the durable fix is a standardisation effort with UEFI vendors to "only make use of the relevant information (e.g. protocol, transfer type, port number, and physical slot index) and drop the product details like the MAC address. The product details, if needed, can be read from the embedded vital product detail information of the network interface card." - A small primitive — a boolean flag — replaces a two-step
show-then-set with a single set, when the read path is
broken. "Since iPXE reads this variable as HEX, it was
reading the string output as hex. To check if the network
boot setting was modified and to reduce boot time (so we
don't have to print the variables before setting them), we
implemented a boolean flag,
uefi-same-hex, to indicate whether a configuration changed. This enabled us to run a single set command instead of first running show to compare, and then set if the configuration was not in the desired state." Reduces both wall-clock and round-trip count. - The endpoint is "a more dynamic system" with one BIOS image for all SKUs. Verbatim: "a single BIOS firmware image serves all SKUs, configuration updates deploy at scale through our existing release pipeline, and the entire workflow operates from iPXE." Configuration becomes part of the release pipeline, not an out-of-band per-server knob — a generalisation of configuration-as-code applied to firmware/BIOS at fleet scale.
Architecture and numbers¶
| Datum | Value |
|---|---|
| Affected fleet | Gen12 core servers |
| Fleet size | nearly 2,000 units |
| Firmware-upgrade automation time, before | nearly 4 hours |
| Firmware-upgrade automation time, after | 3 minutes |
| Subsequent single-boot time, before | about 20 minutes |
| Subsequent single-boot time, after | less than 1 minute |
| Per-failed-interface timeout | ~5 minutes |
| Failed interfaces probed before success | 4 (IPv4 HTTPS / IPv4 iPXE / IPv6 HTTPS / IPv6 iPXE — the actually-succeeding interface, IPv6 HTTPS, is at the end of the search) |
| Per-boot wasted time on linear search | ~20 minutes |
| Wildcard config match string | .*HTTP.*IPv4.*P1 |
| Vendor lockout setting | Force Priority Httpv4 Httpv6 Pxev4 Pxev6 |
| Lazy-loaded data structure | EFI_IFR_REF3 |
| iPXE flag for HEX-read short-circuit | uefi-same-hex |
| Cloudflare-side tool | CfHIIConfig_App |
| Iterations needed for fix | new BIOS from vendor + debug session |
Boot sequence (verbatim)¶
"Our boot automation flow is in three broad stages: firmware initialization, pre-boot, and kernel startup. After power on, the UEFI firmware does some hardware and peripheral initialization followed by the PXE pre-boot environment. The pre-boot sets up the network card and executes a small program called bootloader, which kickstarts the kernel. It's in this PXE stage that various network interfaces are probed for the right one. On first boot, firmware upgrades are included in our boot automation workflow."
The three-stage decomposition is what makes the fix possible: boot order declaration happens early in the pre-boot PXE stage so the linear search through interfaces never starts.
Systems extracted¶
- systems/uefi — Unified Extensible Firmware Interface, the modern firmware standard that initialises hardware and hands off to the OS. The substrate the post operates on.
- systems/ipxe — Open-source network boot firmware supporting modern protocols (HTTP/HTTPS) — the "programmable workflow" layer that makes boot automation feasible (Cloudflare's "automation strategies that ultimately solved the problem").
- systems/cloudflare-gen12-server — Cloudflare Gen12 fleet of core (centralised data centre) bare-metal servers, ~2,000 units; canonical wiki face for the post.
Concepts extracted¶
- concepts/network-boot-interface — the abstraction that lets a server boot its OS over the network rather than local storage; primary instances are PXE and UEFI HTTPS boot.
- concepts/linear-search-timeout-amplification — the failure mode where probing a sequence of options with per-option timeout produces wall-clock cost proportional to the position of the first-succeeding option × the timeout. The post's canonical instance: 4 × 5 min = 20 min per boot.
- concepts/firmware-config-persistence-loss — UEFI config settings are "often reset following a UEFI firmware upgrade"; any automation that depends on persisted UEFI config must validate and re-apply post-upgrade.
- concepts/vendor-string-mismatch-on-fleet-config — same logical interface emits different identifying strings depending on NIC vendor; exact-match config breaks; need wildcards (stop-gap) or string standardisation (durable).
- concepts/lazy-loaded-bios-data-structure — BIOS data structures (e.g. EFI_IFR_REF3) deferred until accessed via a GUI callback to accelerate cold-path BIOS boot; renders the data invisible to programmatic scans.
Patterns extracted¶
- patterns/declare-boot-interface-order-upfront — eliminate guesswork in the network boot stage by declaring the correct interface upfront in the pre-boot PXE stage per hardware/use- case, so the firmware never falls through the timeout gauntlet. Cloudflare's headline pattern.
- patterns/state-validation-with-auto-reapply-and-reboot — after any config change (and after firmware upgrades that may reset the config), validate the new state programmatically; if it doesn't match expectations, re-apply and trigger a reboot.
- patterns/wildcard-config-match-for-vendor-string-drift — use a regex-style wildcard against vendor-specific configuration strings when exact match is unreliable; pair with a string-standardisation roadmap so the wildcard can retire.
- patterns/hex-comparison-flag-for-ipxe-config-check —
small primitive: a boolean flag exposed to iPXE indicating
whether a configuration matches the desired hex
representation, used to short-circuit a
show+ compare + conditionalsetsequence into a singleset.
Caveats¶
- Tier-1 Cloudflare post on infrastructure / firmware internals — passes scope decisively (production incident with operational numbers, fleet-scale firmware automation, multi-vendor coordination, named subsystems).
- No latency p-values — the only quantitative numbers are the headline before/after totals (4h → 3min, ~20min → <1min) and the per-timeout 5-min figure. No per-step latency histograms, no distribution of which servers experienced which interface ordering.
- Specific OEM not named — the post identifies the
immutable setting verbatim (
Force Priority Httpv4 Httpv6 Pxev4 Pxev6) and the data structure (EFI_IFR_REF3) but does not name the BIOS vendor or NIC vendors. The wildcard example pair is sufficient to identify the failure mode but not the vendors. - No DRAM/memory training time — POST is described as completing normally; the 4-hour figure is dominated by network-boot-interface probing, not by hardware initialisation.
CfHIIConfig_Appis referenced but not open-sourced or deeply documented — the post says "we had to implement an additional feature to the CfHIIConfig_App tool, allowing it to set the config without having the full string" but the tool itself is not described beyond the wildcard-matching feature it gained.- The string standardisation effort is in-flight — "We are currently working with our UEFI vendors to standardize the network interface strings" — outcome and timeline not disclosed; wildcards remain the operational primitive in the meantime.
- No regression-test or chaos discipline disclosed — the post does not say whether boot-time regressions of this shape now have alarm coverage, or whether boot-order declarations are validated continuously against the fleet's actual interfaces.
- The fix's interaction with
secure boot/measured bootis not addressed — declaring a specific boot interface upfront could in principle interact with attestation chains (any change to BIOS config produces a different PCR); Cloudflare doesn't discuss this. - First-boot for new nodes still pays the cost — the post notes new nodes face "the full timeout gauntlet on their very first boot" before the declared order takes effect; the validate-then-reapply-then-reboot pattern means the first boot may take slightly longer than baseline, traded for sub-minute subsequent boots.
- Edge fleet not affected — explicitly limited to Cloudflare's core (centralised control-plane / billing / analytics data centres), not the globally-distributed edge POPs. The blog distinguishes the two upfront.
- No comparison vs alternative approaches — declarative boot order vs in-band BIOS update vs just-disabling- unused-interfaces vs always-PXE-then-iPXE-only — only the chosen path is described. The choice of "declare order per hardware/use-case" over "disable all but the right one" is not explicitly justified.
- No OEM/BIOS firmware version numbers — "new BIOS version from our vendor" is the only altitude; specific versions or vendor identities not disclosed.
Source¶
- Original: https://blog.cloudflare.com/optimizing-core-unit-boot-time/
- Raw markdown:
raw/cloudflare/2026-06-01-how-we-reduced-core-unit-boot-time-from-hours-to-minutes-caa2ec1d.md
Related¶
- companies/cloudflare — corpus index
- systems/uefi — firmware-standard substrate
- systems/ipxe — open-source network-boot firmware
- systems/cloudflare-gen12-server — affected fleet
- concepts/network-boot-interface
- concepts/linear-search-timeout-amplification
- concepts/firmware-config-persistence-loss
- concepts/vendor-string-mismatch-on-fleet-config
- concepts/lazy-loaded-bios-data-structure
- patterns/declare-boot-interface-order-upfront
- patterns/state-validation-with-auto-reapply-and-reboot
- patterns/wildcard-config-match-for-vendor-string-drift
- patterns/hex-comparison-flag-for-ipxe-config-check
- patterns/hot-swap-retrofit — sibling fleet-upgrade pattern (AWS EBS)
- concepts/fleet-drain-operation — sibling fleet-ops primitive (Fly.io)