PATTERN Cited by 1 source

State validation with auto-reapply and reboot¶

Intent¶

When applied configuration may be wiped by an out-of-band event (e.g. firmware upgrade), keep the live state converging on the declared state by (1) validating after every change, (2) re-applying if drift is detected, (3) triggering a reboot to make the re-applied state effective.

A reconciliation loop adapted to substrates where the effective state lives behind a reboot boundary.

Context¶

Firmware configuration is not guaranteed to persist across firmware upgrades — "Configuration settings are often reset following a UEFI firmware upgrade" (Cloudflare 2026-06-01 core boot-time post). Any fleet automation that has declaratively set something at the firmware layer (boot order, secure boot config, hardware tunables) cannot trust that the setting will survive subsequent firmware-update operations. The upgrade itself is often what the automation is trying to do, so a one-shot apply isn't enough.

Mechanism¶

"To address these edge cases, we implemented a state validation step. The firmware automation now validates the configuration post-change: if it detects that settings have been modified, it re-applies the config and triggers a reboot."

Three structural pieces:

Post-change validation step. After any firmware operation (upgrade, write, reset), the automation reads the current state of the relevant variables and compares against the declared state. The comparison is value-level (or hex-level — see patterns/hex-comparison-flag-for-ipxe-config-check for the iPXE-specific variant).
Conditional re-apply. If validation reveals drift, the automation re-applies the declared configuration. This is idempotent: re-applying when no drift exists is a no-op.
Reboot trigger. Firmware-level configuration usually only takes effect on the next boot. The automation explicitly triggers a reboot rather than waiting for a future natural one — otherwise the visible state diverges from the applied state until something else cycles the server.

Cloudflare's iPXE-script form of the validation step (verbatim):

# construct path to read the update variable
set buffer-var-guid 91468514-75bc-4bb5-8f33-91efff9e9b1f
set var-upd-path efivar/CfHIIVarUpd-${buffer-var-guid}

# Run the config change command
imgexec <signed CF UEFI configuration App> set ${uefi-setting}=${uefi-value}

# Compare the update variable with the expected value if it has changed.
# If it has changed, set the local variable to reboot the system
iseq ${uefi-same-hex} ${${var-upd-path}} || set has-changed ${uefi-diff-hex}

Operational trade-off (Cloudflare 2026-06-01)¶

"Although the first boot may take slightly longer, this change drastically reduces the time required for all future start-ups from about 20 minutes to less than a minute per subsequent boot."

The validation+reapply+reboot loop adds wall-clock to the first post-upgrade boot. Cloudflare deems this acceptable because the amortised gain — sub-minute subsequent boots vs ~20 min before the declared boot order takes effect — pays back the validation cost across the fleet.

Why "reboot" is part of the pattern (and not just "apply")¶

Most application-layer reconciliation loops apply config and the change is live immediately. At the firmware layer:

The variable change lands in NVRAM, but the effective boot path is determined at the next boot.
A long-running server can have its declared boot order set correctly but still boot incorrectly the next time it cycles for an unrelated reason — unless the apply step is paired with an explicit reboot to validate.

The reboot trigger turns the loop into apply → validate → reboot → next-boot-uses-correct-config, closing the loop on the firmware substrate.

Where this composes¶

Pair with patterns/declare-boot-interface-order-upfront — the declaration is what gets re-applied; the validation loop is what makes the declaration durable across firmware upgrades.
Pair with patterns/hex-comparison-flag-for-ipxe-config-check — the validation comparison may need a hex-encoded variant if the substrate (iPXE) reads variables as hex rather than strings.
Generalisation: configuration-as-code applied at firmware altitude. The desired state lives in source / a release pipeline; the live state is driven there by validation + reapply.

When to use¶

Configuration substrate where settings can be wiped by an out-of-band event (firmware upgrade, factory reset, NVRAM realloc).
Effective state lives behind a reboot or restart boundary (firmware, kernel, system services with cold-load config).
Operating at fleet scale where manual re-apply per machine is impractical.

When not to use¶

Configuration substrate that is read continuously and enforced in real time (containerised config, sidecar-based policies) — the change can be detected and re-applied without a reboot.
One-off administrative work where a human will be present to notice drift.

Risks¶

Repeated reboot loop if the apply step keeps failing (e.g. the OEM's immutable setting blocks the write); the validation succeeds in detecting drift, but the reapply doesn't fix it. Add a bounded retry count + alarm.
Stale expected value — if the declared state is out-of-date for a platform variant, every server will be forced to reboot unnecessarily. Tie the declared state to source-controlled configuration.
Reboot-storm during a fleet-wide firmware upgrade if every server triggers its reapply+reboot at once. Roll out via a controlled cadence.

concepts/firmware-config-persistence-loss — the failure mode this pattern addresses
patterns/declare-boot-interface-order-upfront — the declaration whose persistence this pattern guarantees
patterns/hex-comparison-flag-for-ipxe-config-check — the read-side primitive that makes validation efficient on iPXE
concepts/configuration-as-code
systems/ipxe — substrate
systems/uefi — substrate
systems/cloudflare-gen12-server — canonical wiki instance