Skip to content

CONCEPT Cited by 1 source

Kernel attack surface

Kernel attack surface is the set of kernel codepaths a compromised user-space process can reach, and therefore the set of kernel bugs it could weaponise. Shrinking this surface is the premise behind every syscall-filtering sandbox (seccomp, gVisor, hypervisor-backed containers) and behind minimal- surface hypervisors (systems/firecracker).

(Source: sources/2026-04-21-figma-server-side-sandboxing-containers-and-seccomp)

Why kernel surface is bigger than hypervisor surface

Figma's framing:

"The attack surface of a hypervisor is usually smaller than for an OS kernel."

The Linux kernel includes filesystems, network stacks, driver subsystems, the VFS, IPC mechanisms, scheduling, memory management, security modules — millions of lines of C, most of which a given workload never touches. Every one of those subsystems is a potential bug to weaponise.

A minimal hypervisor (Firecracker, KVM's minimal device model, Cloud Hypervisor) deliberately drops features to shrink its surface — fewer device emulators, no legacy I/O, tight paravirt interface. The kernel can't make the same move without losing its role.

How sandboxes shrink kernel surface

Three mechanisms, in order of increasing strength:

  1. seccomp syscall allowlist — the workload can still call the kernel, but only through a named subset. Kernel code reachable only via refused syscalls is out of reach. Cheapest; narrowest effect.
  2. namespaces + cgroups — restrict what the process sees of kernel-managed resources (filesystems, PIDs, network, IPC). Orthogonal to syscall filtering; composable with it.
  3. Kernel interposition (gVisor) — put a user-space reimplemented kernel between the workload and the host kernel; only a narrow vetted interface reaches the host. Highest strength; highest compatibility / performance cost.

The seccomp-specific framing

For seccomp-only sandboxes, Figma frames the attack surface as having exactly two elements:

"The attack surface consists of two elements: the kernel's seccomp implementation and system call (syscall) interface, and the allowed list of syscalls."

  • The seccomp implementation itself — bugs in the kernel code that evaluates BPF filters. Rare, but historical examples exist.
  • The allowlist — every syscall on it is a kernel codepath still reachable. "Every incremental increase in allowed system calls results in extra kernel attack surface to consider."

This is why Figma's allowlist is so narrow — each addition is an explicit surface expansion the team consciously accepts.

Quantifying is hard

Figma's honest caveat:

"Overall, it's not straightforward to compare the level of security isolation provided by VMs versus containerization. For example, you could argue that the attack surface of a hypervisor is usually smaller than for an OS kernel, or discuss the number of kernel exploits in recent years that would have allowed a container escape. On the other hand, there are bugs that allow a VM breakout without attacking the hypervisor itself."

So "kernel surface is bigger" is a useful framing, not a proof. CVE counts, exploit complexity, and weaponisation latency all differ; absolute statements don't hold.

Seen in

Last updated · 200 distilled / 1,178 read