Figma — Server-side sandboxing — Containers and seccomp¶
Summary¶
Part 3 of Figma's security-engineering 3-part series on server-side
sandboxing (aka workload isolation) — the practice of accepting
that vulnerabilities will exist in code that processes user-supplied
input (images, documents, SVGs) and minimising blast radius instead of
trying to prevent them outright. Where
Part 2
covered the VM row of the primitive table, Part 3 covers the remaining
two: containers (kernel-namespace + cgroup isolation) and
seccomp-only (syscall-allowlist isolation). Figma's production
exemplar is RenderServer — a C++ headless editor that converts
Figma files to images / SVGs — which runs inside
nsjail (namespaces + cgroups + seccomp-bpf
composition) for the full-featured GPU-accelerated path, and inside a
seccomp-only sandbox for certain non-GPU paths after a source-code
refactor that reordered all open() calls before image processing.
Key takeaways¶
-
Containers are not automatically secure sandboxes. The level of isolation depends on three factors: runtime implementation (bugs in runC / containerd), OS primitives exposed to the runtime (kernel namespaces, cgroups, seccomp, SELinux, AppArmor — and kernel vulnerabilities in them), and runtime configuration (user-chosen). "Unlike commodity VM solutions, containers place a much greater responsibility on the user to correctly configure the desired level of isolation." More control = more room for mistakes. (Source: this article)
-
Container escape has three attack-surface components. A kernel vulnerability, a runtime-implementation bug, and/or a runtime misconfiguration can each allow a workload to break out and modify files / execute code on the host. Dirty COW, Dirty Pipe, and CVE references in the post are cited as recent kernel-level examples. See concepts/container-escape.
-
Seccomp-only sandboxes are the lightest primitive but require workload discipline. The premise: "many programs do pure computation, and thus do not need dynamic access to the filesystem or to make network calls at all." For those, a seccomp allowlist restricting syscalls to (ideally) write-to-open-fd, exit, memory allocation, and time-fetching gives extremely strong isolation with negligible overhead. Seccomp is used in Android, Chrome, Firefox and composes with containerisation in systems/nsjail and systems/firejail. See concepts/seccomp / concepts/syscall-allowlist.
-
The seccomp allowlist is brittle by construction. "Every incremental increase in allowed system calls results in extra kernel attack surface to consider." Figma restricts programs to writing output to already-open file descriptors, exiting, allocating memory, and fetching current time — avoiding the filesystem, network/socket, and keychain surfaces entirely. The original 1997 seccomp shipped with only
read,write,exit,sigreturn; real programs need more, and each addition is a conscious attack-surface expansion. -
Seccomp's pointer-dereference limitation forces program refactors. Seccomp-bpf "can only filter syscall arguments at the top level and can't dereference pointer arguments." It cannot filter
openatby path — the path is a pointer. So a program that needs to open files dynamically through user-input-driven codepaths cannot be seccomp-sandboxed without rewriting to open all files before the dangerous processing step. Figma refactored RenderServer's file I/O to do exactly this. See patterns/refactor-for-seccomp-filter. -
VM ↔ container ↔ seccomp is not a linear scale. Direct comparison is "more complicated and nuanced than with VMs." VMs have a small hypervisor attack surface but less fine-grained control + higher performance cost; containers have large kernel attack surface but fine-grained cgroup / namespace / seccomp / MAC controls; gVisor interposes its own hardened kernel between the host kernel and the container process as a middle option; seccomp-only can achieve "extremely strong isolation if only minimal syscalls are allowed" with the lowest overhead. Orchestration + correct configuration are the recurring tax.
-
RenderServer at Figma uses nsjail as a drop-in solution. Explicit rejection of Docker: "we would need to create a new service that sandboxes the RenderServer binary inside a secure Docker configuration, create an orchestration system to manage the service, and re-architect various services to make a network call to the RenderServer service instead of invoking the binary directly." Per user request, nsjail starts RenderServer in new user / pid / mount / network namespaces with no network access, specific mount points only (input file, libraries, output folder), and seccomp-bpf. See systems/nsjail.
-
Rollout surfaced real configuration foot-guns.
- Default
rlimit_fsize = 1 MBsilently truncated output files for large-image inputs → job errors correlated with exactly-1-MB outputs. Fix: one-line config change after reading docs carefully. -
Seccomp allowlist needed several iterations — rare codepaths in RenderServer's complex C++ codebase triggered syscalls not hit during testing or internal use. "Kernel logs will indicate when a process is killed by seccomp and which syscall caused the problem, without providing much more context."
-
Seccomp-only RenderServer traded operational simplicity for engineering invariants. For non-GPU paths Figma refactored RenderServer so all file opens occur before any image processing happens on potentially dangerous user input, then applied a restrictive seccomp filter via libseccomp. Result:
- ✅ Easier to test and debug than nsjail.
- ✅ Significantly faster at runtime.
- ❌ Locks RenderServer into a single-threaded model.
-
❌ Cannot dynamically load fonts or images later in runtime.
-
Startup cost of container sandboxes is real but bounded. "The startup time of nsjail is typically on the order of small fractions of a second, tens to low hundreds of milliseconds. There is, however, still a long tail of startup times, and initializing a language runtime within the container can take substantially longer." Lower overhead than VMs, higher than seccomp-only.
Operational numbers disclosed¶
- nsjail startup latency: tens to low hundreds of milliseconds; long tail; language-runtime init can extend significantly.
- Seccomp allowlist adjustments during rollout: "several times" — driven by rare codepaths hit in production that weren't exercised in testing.
rlimit_fsizedefault: 1 MB (nsjail default, tripped by large-image outputs).- Seccomp allowlist target at Figma: write to already-open fds,
exit, memory allocation,
clock_gettime— every additional syscall is a conscious attack-surface expansion.
Systems / concepts / patterns extracted¶
Systems:
- systems/nsjail — Google's cmdline tool stacking Linux namespaces + capabilities + filesystem restrictions + cgroups + resource limits + seccomp. Figma's production sandbox for RenderServer.
- systems/firejail — SUID-based sandbox, referenced as the composition sibling.
- systems/docker — rejected for RenderServer due to orchestration overhead; named as the container platform whose runC runtime exposes namespaces/cgroups/seccomp/SELinux/AppArmor.
- systems/runc — the Docker default runtime whose bugs / misconfigurations contribute to the container-escape attack surface.
- systems/gvisor — hardened-kernel interposition between host kernel and container process; reduces container attack surface at the cost of interpretation overhead + compatibility gaps.
- systems/figma-renderserver — the subject of the sandboxing decision; C++ headless Figma editor used for thumbnailing / file-format conversion.
Concepts:
- concepts/container-escape — the three-axis attack surface: kernel vuln × runtime impl × configuration.
- concepts/seccomp — Linux syscall-allowlist primitive (secure-computing mode); Android/Chrome/Firefox scale adopters.
- concepts/syscall-allowlist — the policy artefact seccomp enforces; brittle by construction, expanded by trial-and-error.
- concepts/linux-namespaces — kernel isolation of user / pid / mount / network / IPC / UTS; the primitive containers are built on.
- concepts/kernel-attack-surface — why container escape is plausible and why hypervisor surface is smaller.
- Reuses concepts/server-side-sandboxing, concepts/defense-in-depth, concepts/threat-modeling, concepts/linux-cgroup.
Patterns:
- patterns/refactor-for-seccomp-filter — reorder all dangerous syscalls (file opens, socket creation, etc.) to happen before user-input processing, then apply a restrictive seccomp filter that denies those syscalls for the rest of the process lifetime. Canonical example: RenderServer's SVG-export path.
- patterns/seccomp-bpf-container-composition — combine namespaces + cgroups + seccomp-bpf in one sandbox (nsjail / firejail) so that multiple independent isolation mechanisms compose as concepts/defense-in-depth.
Caveats¶
- No QPS / throughput / fleet-size numbers for RenderServer in production — only qualitative "small fractions of a second" nsjail startup and "significantly faster" for the seccomp-only variant.
- No disclosure of the exact seccomp allowlist (only the four named families: write-to-open-fd, exit, memory allocation, time).
- No cost comparison (nsjail vs seccomp-only vs VM) in $/workload or CPU-seconds.
- No incident retrospective — the post frames rollout as
smooth modulo the
rlimit_fsizefoot-gun and the expected seccomp-allowlist iteration. - No multi-region / multi-AZ deployment detail — RenderServer's actual scheduling + failure recovery architecture is not disclosed (nsjail is a per-request process; the pool of workers invoking it is not.)
- Compared-to-VMs section is qualitative ("the attack surface of a hypervisor is usually smaller than for an OS kernel") with CVE references but no quantitative surface measurement.
Source¶
- Original: https://www.figma.com/blog/server-side-sandboxing-containers-and-seccomp/
- Raw markdown:
raw/figma/2026-04-21-server-side-sandboxing-containers-and-seccomp-b4d1b836.md
Related¶
- sources/2026-04-21-figma-server-side-sandboxing-virtual-machines — Part 2 of the series (VMs row of the primitive table).
- concepts/server-side-sandboxing — umbrella concept (the three-primitive table).
- concepts/defense-in-depth — why composing namespaces + cgroups + seccomp matters.
- companies/figma — operator.