CONCEPT Cited by 1 source

seccomp¶

seccomp (short for secure computing mode) is a Linux kernel feature that restricts the set of system calls a process is allowed to make. Once a seccomp filter is installed, the kernel checks every syscall the process issues against the filter; denied syscalls either fail with a configurable errno or kill the process. It is the Linux kernel's narrowest isolation primitive: instead of walling off what a process can touch (filesystem, network, other processes — namespaces / cgroups do that), seccomp restricts which kernel code paths the process can reach.

"Seccomp, short for secure computing mode, can restrict the system calls a program is allowed to make." (Source: sources/2026-04-21-figma-server-side-sandboxing-containers-and-seccomp)

History: strict → seccomp-bpf¶

Original seccomp (1997) allowed only read, write, exit, sigreturn — the "pure compute" subset. "Under the theory that truly 'pure' compute needed only these to function and that this reduced kernel attack surface enough to be defensible." Too strict for most real programs.
seccomp-bpf (2012) replaced the fixed allowlist with a programmable Berkeley Packet Filter program run on every syscall. The caller supplies a BPF filter; the kernel evaluates it per syscall. Now the default seccomp mode used by Chrome, Firefox, Android, Docker, nsjail, firejail, etc.

Why "pure compute" is not trivial¶

Figma's honest framing:

"Unfortunately, many things that engineers consider to be pure compute require more system calls. In particular, writing code without the ability to allocate heap memory is a pretty drastic change in many developers' working models — and language runtimes, core libraries, or tracing helpers often want to know the current system time. These operations seem innocuous, but every incremental increase in allowed system calls results in extra kernel attack surface to consider."

So even "pure compute" programs typically need at minimum: mmap / brk for memory, clock_gettime for time, write for output, exit. Figma's allowlist for seccomp-restricted programs: writing output to already-open file descriptors, exiting, allocating memory, and fetching the current time — avoiding filesystem, network/socket, and keychain surfaces entirely.

Adoption in production¶

Seccomp is not esoteric — "seccomp is a powerful isolation primitive used in many commonly used applications like Android, Chrome, and Firefox." Browsers lean on it to isolate renderer / GPU / network processes; Android uses it to gate system-call access per app. The sandbox-composition tools systems/nsjail and systems/firejail wrap it together with namespaces / cgroups / capabilities as a layered server-side isolation primitive.

Key limitations¶

Top-level argument filtering only¶

seccomp-bpf "can only filter syscall arguments at the top level and can't dereference pointer arguments or do other more complicated argument processing." Consequences:

Cannot filter openat() by path — the path is a char pointer.
Cannot filter execve() by program name — same reason.
Cannot distinguish connect() to different hostnames — the sockaddr is behind a pointer.

Implication: coarse allowlists for programs that need dynamic file / network access. Either allow the syscall and rely on other layers to restrict the targets (namespaces, cgroup egress filters, LSM), or rewrite the program so the sharp syscall boundary can land before user-input processing — patterns/refactor-for-seccomp-filter. Figma's RenderServer followed the second path.

Allowlists are brittle to program evolution¶

"Seccomp allowlists can be brittle. New program behavior may require adding more syscalls to the allowlist if they don't significantly increase attack surface or significant rewriting if the new syscalls do add unacceptable attack surface." Every library upgrade, every new feature codepath, every new kernel library dependency is a potential allowlist-break.

Debugging is painful¶

"Kernel logs will indicate when a process is killed by seccomp and which syscall caused the problem, without providing much more context. It can be laborious and frustrating to debug and reproduce the issue." Contributing failure modes enumerated by Figma:

Overlooked syscall in a seldom-used codepath.
Recently-introduced behaviour requires a new syscall.
System change (upgraded library, kernel) triggers a new syscall in an existing codepath.
Architectural difference between test environment and production (ARM vs x86 → different glibc implementation → different syscall sequence).
A very rare instance in which the program was actually maliciously exploited.

Good CI testing catches some but not all of these.

The seccomp posture at Figma¶

Figma's disclosed allowlist (for pure-compute-style workloads):

✅ Write to already-open file descriptors.
✅ Exit.
✅ Allocate memory.
✅ Fetch current time.
❌ Filesystem (no openat).
❌ Network / socket management.
❌ Keychain.

Applied via libseccomp to the seccomp-only variant of systems/figma-renderserver, after a source-code refactor that moved all file opens before the dangerous user-input processing step (patterns/refactor-for-seccomp-filter).

When to use seccomp alone vs composed¶

Seccomp alone — when the workload is close to pure compute, source-modifiable, and you can accept / engineer around allowlist limitations. Lowest overhead; lowest compatibility.
Seccomp composed with containerisation — when you need isolation against filesystem / network / process-table leakage too. systems/nsjail / systems/firejail / Docker-with-seccomp-profile all give you this. Defence in depth at the kernel-primitive layer.

Seen in¶

sources/2026-04-21-figma-server-side-sandboxing-containers-and-seccomp — canonical distillation of seccomp's value, limits, and failure modes, plus Figma's concrete allowlist posture for RenderServer.