SYSTEM Cited by 1 source
Cloudflare bpf-lsm¶
Cloudflare bpf-lsm is Cloudflare's production framework for live-patching kernel security vulnerabilities without rebooting, built on the Linux kernel's BPF LSM (Linux Security Module) interface. Rather than wait for a patched kernel to roll through the Edge Reboot Release pipeline (4-week cycle) or take the coarse action of unloading a vulnerable kernel module fleet-wide, Cloudflare ships a targeted eBPF program that denies the specific LSM hook an attacker needs. The framework was introduced publicly in Cloudflare's Live-patch security vulnerabilities with eBPF LSM post and canonicalised on this wiki via the 2026-05-07 Copy Fail mitigation.
Role for this wiki¶
- Runtime kernel mitigation substrate. The canonical Cloudflare answer when a CVE drops and the patched kernel isn't yet on the majority-LTS-line. Closes the window that biweekly patching + the LTS backport latency gap together produce.
- No-reboot mechanism. Deployment doesn't restart any service; doesn't touch the kernel module load state; doesn't require a fleet drain. The program attaches to an LSM hook, the user-space loader updates BPF maps, traffic continues.
- Pairs with systems/prometheus-ebpf-exporter. The
same eBPF toolchain provides the measurement side:
ebpf_exporterhooks the relevant syscall to quantify legitimate usage (what binaries / what rates) before enforcement lands — the patterns/visibility-before-enforcement-rollout pattern.
Canonical usage shape (Copy Fail, 2026-04-30)¶
The Copy Fail CVE (out-of-bounds write in the
authencesn wrapper of the kernel crypto API, reachable via
AF_ALG sockets
+ algif_aead) is gated by a bpf-lsm program on the
socket_bind LSM hook:
- If
socket_family != AF_ALG→ allow (cheap, almost all traffic). - If
socket_family == AF_ALG→ check calling binary path against an allow-list of legitimate AF_ALG users. - If binary on allow-list → allow the bind.
- Otherwise → deny with
EPERM.
Exploit attempts see PermissionError: [Errno 1] Operation
not permitted; legitimate services continue. No kernel
module is unloaded; the kernel crypto API remains available
to the allow-listed service. Canonical instance of
patterns/bpf-lsm-allowlist-hook-denial.
The researchers' recommended fix — unconditional
algif_aead removal via modprobe blacklist — was
Cloudflare's first attempt on 2026-04-29 evening. It failed
in staging because other software legitimately uses the
kernel crypto API. bpf-lsm's hook-level selectivity is
what made the no-reboot mitigation viable
(patterns/staging-caught-mitigation-failure).
Architectural properties¶
- Hook-level selectivity. bpf-lsm targets a specific
LSM hook (here
socket_bind); doesn't remove the kernel module, doesn't block all access to the subsystem. Surgical wheremodprobeblacklist is coarse. - Allow-list posture. Default-deny for the hook, with an allow-list of legitimate callers. The allow-list is validated empirically via systems/prometheus-ebpf-exporter's per-binary usage metrics before enforcement.
- Two-gate deployment. Visibility (
ebpf_exporter) gate + enforcement (bpf-lsm) gate push separately. Each rolls back independently. - Verifier-gated safety. Programs go through the eBPF verifier before loading; termination, memory-safety, and bounded complexity are proven statically.
- Open-source tooling. The underlying
ebpf_exporteris Cloudflare-owned OSS at github.com/cloudflare/ebpf_exporter; the bpf-lsm program source for any given CVE is a Cloudflare-specific artefact (Copy Fail's program source not publicly published). - Distinct from whole-fleet kernel reboot. Runs in parallel with the scheduled patched-kernel rollout via ERR. bpf-lsm covers the window; ERR closes it.
Known limitations (self-disclosed 2026-05-07)¶
From the Copy Fail post's "Key areas we identified for improvement":
- "Better runtime mitigation. bpf-lsm is a valuable tool for mitigations, but we want to make this tool even better. This will include looking into faster deployments, better playbooks, and better logging and visibility of the tool."
- Visibility gap into which services depend on which kernel subsystems — the reason the initial Copy Fail mitigation plan (whole-module removal) failed in staging.
- bpf-lsm mitigations are per-CVE engineering effort; not every kernel CVE is gatable at an LSM hook.
Seen in¶
- 2026-05-07 — Copy Fail Linux vulnerability
response. Canonical wiki first-class page for bpf-lsm.
socket_bind-hook allow-list denial forAF_ALGgated by calling-binary path. End-to-end verification on a previously-vulnerable test node confirmed the exploit no longer works after the program is loaded. Follow-up work named explicitly: faster deployments, better playbooks, better logging/visibility of the tool. (Source: sources/2026-05-07-cloudflare-copy-fail-linux-vulnerability-response)
Related¶
- systems/ebpf
- systems/cloudflare-edge-reboot-release
- systems/prometheus-ebpf-exporter
- concepts/copy-fail-cve-2026-31431
- concepts/af-alg-kernel-crypto-socket-family
- concepts/kernel-attack-surface
- patterns/bpf-lsm-allowlist-hook-denial
- patterns/visibility-before-enforcement-rollout
- patterns/autonomous-distributed-mitigation
- companies/cloudflare