PATTERN Cited by 1 source
State validation with auto-reapply and reboot¶
Intent¶
When applied configuration may be wiped by an out-of-band event (e.g. firmware upgrade), keep the live state converging on the declared state by (1) validating after every change, (2) re-applying if drift is detected, (3) triggering a reboot to make the re-applied state effective.
A reconciliation loop adapted to substrates where the effective state lives behind a reboot boundary.
Context¶
Firmware configuration is not guaranteed to persist across firmware upgrades — "Configuration settings are often reset following a UEFI firmware upgrade" (Cloudflare 2026-06-01 core boot-time post). Any fleet automation that has declaratively set something at the firmware layer (boot order, secure boot config, hardware tunables) cannot trust that the setting will survive subsequent firmware-update operations. The upgrade itself is often what the automation is trying to do, so a one-shot apply isn't enough.
Mechanism¶
"To address these edge cases, we implemented a state validation step. The firmware automation now validates the configuration post-change: if it detects that settings have been modified, it re-applies the config and triggers a reboot."
Three structural pieces:
- Post-change validation step. After any firmware operation (upgrade, write, reset), the automation reads the current state of the relevant variables and compares against the declared state. The comparison is value-level (or hex-level — see patterns/hex-comparison-flag-for-ipxe-config-check for the iPXE-specific variant).
- Conditional re-apply. If validation reveals drift, the automation re-applies the declared configuration. This is idempotent: re-applying when no drift exists is a no-op.
- Reboot trigger. Firmware-level configuration usually only takes effect on the next boot. The automation explicitly triggers a reboot rather than waiting for a future natural one — otherwise the visible state diverges from the applied state until something else cycles the server.
Cloudflare's iPXE-script form of the validation step (verbatim):
# construct path to read the update variable
set buffer-var-guid 91468514-75bc-4bb5-8f33-91efff9e9b1f
set var-upd-path efivar/CfHIIVarUpd-${buffer-var-guid}
# Run the config change command
imgexec <signed CF UEFI configuration App> set ${uefi-setting}=${uefi-value}
# Compare the update variable with the expected value if it has changed.
# If it has changed, set the local variable to reboot the system
iseq ${uefi-same-hex} ${${var-upd-path}} || set has-changed ${uefi-diff-hex}
Operational trade-off (Cloudflare 2026-06-01)¶
"Although the first boot may take slightly longer, this change drastically reduces the time required for all future start-ups from about 20 minutes to less than a minute per subsequent boot."
The validation+reapply+reboot loop adds wall-clock to the first post-upgrade boot. Cloudflare deems this acceptable because the amortised gain — sub-minute subsequent boots vs ~20 min before the declared boot order takes effect — pays back the validation cost across the fleet.
Why "reboot" is part of the pattern (and not just "apply")¶
Most application-layer reconciliation loops apply config and the change is live immediately. At the firmware layer:
- The variable change lands in NVRAM, but the effective boot path is determined at the next boot.
- A long-running server can have its declared boot order set correctly but still boot incorrectly the next time it cycles for an unrelated reason — unless the apply step is paired with an explicit reboot to validate.
The reboot trigger turns the loop into apply → validate → reboot → next-boot-uses-correct-config, closing the loop on the firmware substrate.
Where this composes¶
- Pair with patterns/declare-boot-interface-order-upfront — the declaration is what gets re-applied; the validation loop is what makes the declaration durable across firmware upgrades.
- Pair with patterns/hex-comparison-flag-for-ipxe-config-check — the validation comparison may need a hex-encoded variant if the substrate (iPXE) reads variables as hex rather than strings.
- Generalisation: configuration-as-code applied at firmware altitude. The desired state lives in source / a release pipeline; the live state is driven there by validation + reapply.
When to use¶
- Configuration substrate where settings can be wiped by an out-of-band event (firmware upgrade, factory reset, NVRAM realloc).
- Effective state lives behind a reboot or restart boundary (firmware, kernel, system services with cold-load config).
- Operating at fleet scale where manual re-apply per machine is impractical.
When not to use¶
- Configuration substrate that is read continuously and enforced in real time (containerised config, sidecar-based policies) — the change can be detected and re-applied without a reboot.
- One-off administrative work where a human will be present to notice drift.
Risks¶
- Repeated reboot loop if the apply step keeps failing (e.g. the OEM's immutable setting blocks the write); the validation succeeds in detecting drift, but the reapply doesn't fix it. Add a bounded retry count + alarm.
- Stale
expectedvalue — if the declared state is out-of-date for a platform variant, every server will be forced to reboot unnecessarily. Tie the declared state to source-controlled configuration. - Reboot-storm during a fleet-wide firmware upgrade if every server triggers its reapply+reboot at once. Roll out via a controlled cadence.
Related¶
- concepts/firmware-config-persistence-loss — the failure mode this pattern addresses
- patterns/declare-boot-interface-order-upfront — the declaration whose persistence this pattern guarantees
- patterns/hex-comparison-flag-for-ipxe-config-check — the read-side primitive that makes validation efficient on iPXE
- concepts/configuration-as-code
- systems/ipxe — substrate
- systems/uefi — substrate
- systems/cloudflare-gen12-server — canonical wiki instance