SYSTEM Cited by 1 source
Cloudflare Edge Reboot Release (ERR)¶
Edge Reboot Release (ERR) is Cloudflare's scheduled pipeline for rolling updated Linux kernels through the global edge fleet via coordinated server reboots on a four-week cycle. The pipeline sits on top of Cloudflare's custom Linux kernel build (based on community Long-Term Support (LTS) releases) and is the canonical substrate for both routine kernel patching and CVE-driven kernel-patch rollout. First disclosed publicly as a named pipeline in the 2026-05-07 Copy Fail response post.
Release cadence¶
Three distinct tempos stacked on top of each other:
-
Upstream LTS community tempo. The Linux kernel community regularly merges and releases security + stability updates against the LTS branches Cloudflare tracks (e.g. 6.12, 6.18 as of 2026-04-29). Cloudflare consumes these as the source of truth for kernel patches.
-
Cloudflare internal weekly build. An automated job generates a new internal kernel build approximately every week from upstream LTS pulls. These builds are tested in staging datacenters before global release.
-
Edge Reboot Release — four-week cycle. Following a successful staging validation, the ERR pipeline manages a systematic update and reboot of the edge infrastructure on a four-week cycle. Control-plane infrastructure typically adopts the most recent kernel with reboots scheduled according to workload requirements (separate from the 4-week edge cadence).
Net: Cloudflare's self-described "practice of deploying Linux patch updates every two weeks" combines the weekly build tempo with the ERR cadence — individual servers see a kernel update roughly biweekly once the build flows through the pipeline.
Role in CVE response¶
The structural gap the 2026-05-07 Copy Fail post exposes: "despite our practice of deploying Linux patch updates every two weeks, we remained vulnerable because a month-old mainline fix had yet to be backported to our primary kernel line." The ERR pipeline is downstream of LTS backport. If the upstream fix hasn't reached the LTS series Cloudflare runs, ERR can't ship it — no matter how fast the internal build tempo. Canonical instance of concepts/lts-kernel-backport-latency-gap.
The Copy Fail decision on 2026-04-30 ~17:00 UTC was to "ship a patched build of the previous LTS line through reboot automation; do not accelerate the new LTS; lean on bpf-lsm in the meantime." This captures the canonical CVE playbook:
- If the fix is backported to the majority LTS line → build + validate + roll via ERR at normal pace.
- If not → lean on runtime mitigation (bpf-lsm) to cover the window, don't accelerate LTS migration under pressure.
Architectural properties¶
- LTS-based custom build. Cloudflare runs a custom kernel built from community LTS releases — not stock LTS, not mainline. Multiple LTS series may be in flight simultaneously (6.12 majority + 6.18 subset at Copy Fail disclosure).
- Staged rollout via staging datacenters. Builds test in staging before global release. Canonical instance of patterns/staged-rollout at kernel- release altitude.
- Reboot-coordinated, not blue-green. The model doesn't spin up a parallel fleet running the new kernel; it reboots existing servers through the pipeline. Workload continuity during reboots depends on fleet-wide redundancy + anycast drain.
- Scheduled, not event-driven by default. The 4-week cadence is independent of CVE disclosures. CVE-driven acceleration is a manual decision (2026-04-30 case study).
- Control-plane distinct from edge. Control-plane runs the newest kernel available, with reboots scheduled per-workload — a faster loop than the 4-week edge cadence.
Known limitations (self-disclosed 2026-05-07)¶
- LTS-backport-latency gap. By construction, ERR cannot ship a fix the LTS series hasn't received yet. The 2026-05-07 post names this as the reason Cloudflare remained vulnerable to Copy Fail despite its biweekly cadence.
- 4-week cycle is slow for urgent CVEs. Out-of- cycle reboots are possible (post mentions servers manually rebooted after 2026-05-04 to pick up the patched kernel) but aren't the default. Runtime mitigation via bpf-lsm is the primary answer for within-cycle urgency.
- Reboot discipline required. Coordinated reboots at edge scale (330 cities per Cloudflare's public footprint) require careful traffic draining + anycast shuffling; the pipeline's mechanics are not deeply disclosed in this post.
Seen in¶
- 2026-05-07 — Copy Fail Linux vulnerability response. First public disclosure of the ERR pipeline name + cadence + three-tempo structure. 2026-04-30 ~17:00 UTC decision "ship a patched build of the previous LTS line through reboot automation" canonicalises ERR as the scheduled kernel-patch rollout substrate; 2026-05-04 morning "Reboot automation resumed at normal pace with the patched kernel" canonicalises the return-to- normal-cadence signal. (Source: sources/2026-05-07-cloudflare-copy-fail-linux-vulnerability-response)