Skip to content

SYSTEM Cited by 1 source

Cloudflare bpf-lsm

Cloudflare bpf-lsm is Cloudflare's production framework for live-patching kernel security vulnerabilities without rebooting, built on the Linux kernel's BPF LSM (Linux Security Module) interface. Rather than wait for a patched kernel to roll through the Edge Reboot Release pipeline (4-week cycle) or take the coarse action of unloading a vulnerable kernel module fleet-wide, Cloudflare ships a targeted eBPF program that denies the specific LSM hook an attacker needs. The framework was introduced publicly in Cloudflare's Live-patch security vulnerabilities with eBPF LSM post and canonicalised on this wiki via the 2026-05-07 Copy Fail mitigation.

Role for this wiki

  • Runtime kernel mitigation substrate. The canonical Cloudflare answer when a CVE drops and the patched kernel isn't yet on the majority-LTS-line. Closes the window that biweekly patching + the LTS backport latency gap together produce.
  • No-reboot mechanism. Deployment doesn't restart any service; doesn't touch the kernel module load state; doesn't require a fleet drain. The program attaches to an LSM hook, the user-space loader updates BPF maps, traffic continues.
  • Pairs with systems/prometheus-ebpf-exporter. The same eBPF toolchain provides the measurement side: ebpf_exporter hooks the relevant syscall to quantify legitimate usage (what binaries / what rates) before enforcement lands — the patterns/visibility-before-enforcement-rollout pattern.

Canonical usage shape (Copy Fail, 2026-04-30)

The Copy Fail CVE (out-of-bounds write in the authencesn wrapper of the kernel crypto API, reachable via AF_ALG sockets + algif_aead) is gated by a bpf-lsm program on the socket_bind LSM hook:

  1. If socket_family != AF_ALG → allow (cheap, almost all traffic).
  2. If socket_family == AF_ALG → check calling binary path against an allow-list of legitimate AF_ALG users.
  3. If binary on allow-list → allow the bind.
  4. Otherwise → deny with EPERM.

Exploit attempts see PermissionError: [Errno 1] Operation not permitted; legitimate services continue. No kernel module is unloaded; the kernel crypto API remains available to the allow-listed service. Canonical instance of patterns/bpf-lsm-allowlist-hook-denial.

The researchers' recommended fix — unconditional algif_aead removal via modprobe blacklist — was Cloudflare's first attempt on 2026-04-29 evening. It failed in staging because other software legitimately uses the kernel crypto API. bpf-lsm's hook-level selectivity is what made the no-reboot mitigation viable (patterns/staging-caught-mitigation-failure).

Architectural properties

  • Hook-level selectivity. bpf-lsm targets a specific LSM hook (here socket_bind); doesn't remove the kernel module, doesn't block all access to the subsystem. Surgical where modprobe blacklist is coarse.
  • Allow-list posture. Default-deny for the hook, with an allow-list of legitimate callers. The allow-list is validated empirically via systems/prometheus-ebpf-exporter's per-binary usage metrics before enforcement.
  • Two-gate deployment. Visibility (ebpf_exporter) gate + enforcement (bpf-lsm) gate push separately. Each rolls back independently.
  • Verifier-gated safety. Programs go through the eBPF verifier before loading; termination, memory-safety, and bounded complexity are proven statically.
  • Open-source tooling. The underlying ebpf_exporter is Cloudflare-owned OSS at github.com/cloudflare/ebpf_exporter; the bpf-lsm program source for any given CVE is a Cloudflare-specific artefact (Copy Fail's program source not publicly published).
  • Distinct from whole-fleet kernel reboot. Runs in parallel with the scheduled patched-kernel rollout via ERR. bpf-lsm covers the window; ERR closes it.

Known limitations (self-disclosed 2026-05-07)

From the Copy Fail post's "Key areas we identified for improvement":

  • "Better runtime mitigation. bpf-lsm is a valuable tool for mitigations, but we want to make this tool even better. This will include looking into faster deployments, better playbooks, and better logging and visibility of the tool."
  • Visibility gap into which services depend on which kernel subsystems — the reason the initial Copy Fail mitigation plan (whole-module removal) failed in staging.
  • bpf-lsm mitigations are per-CVE engineering effort; not every kernel CVE is gatable at an LSM hook.

Seen in

  • 2026-05-07 — Copy Fail Linux vulnerability response. Canonical wiki first-class page for bpf-lsm. socket_bind-hook allow-list denial for AF_ALG gated by calling-binary path. End-to-end verification on a previously-vulnerable test node confirmed the exploit no longer works after the program is loaded. Follow-up work named explicitly: faster deployments, better playbooks, better logging/visibility of the tool. (Source: sources/2026-05-07-cloudflare-copy-fail-linux-vulnerability-response)
Last updated · 451 distilled / 1,324 read