Figma¶
Figma Engineering blog — Figma is a browser-based collaborative design tool. Its client is a C++ application compiled to WebAssembly + a TypeScript UI layer, so engineering content skews toward client-perf and build-tooling topics (unusual for a web product) plus the usual backend / infra / live-collaboration posts.
Tier: Figma is not in AGENTS.md's formal Tier 1/2/3 lists; treat as Tier-3-equivalent — apply the Tier-3 selectivity filter (skip pure product-PR / hiring / design-trend posts). Distributed-systems-internals, build-systems, and client-perf engineering posts with numbers are on-topic.
Architectural framing: Figma's 2026-04-21 game-engine-inspiration post (sources/2026-04-21-figma-how-figma-draws-inspiration-from-the-gaming-world) is the umbrella article for how the rest of Figma's engineering content composes. The client architecture is a game-engine stack adapted for the browser (patterns/game-engine-stack-for-web-canvas) — C++ → WebAssembly for the canvas, React + TypeScript for the UI shell, Rust for the multiplayer server. Every product feature lands as a named system (concepts/game-engine-architecture) — multiplayer, collision, animation, chat/audio, auto-layout, component/variant, widget/plugin, plus Figma-specific systems (permissions DSL, parameter runtime, materializer, etc.). Composition brings cross-subsystem bug propagation — the connector/autosave cascade is the canonical disclosed instance. The [[sources/2026-04-21-figma-rendering-powered-by-webgpu|2026-04-21 WebGPU rendering deep-dive]] is the canvas-leg corroborator for the game-engine framing at the graphics-API layer: explicit per-draw state (concepts/explicit-graphics-state), compute shaders (concepts/compute-shader), and shared C++ code across browser Wasm and native via Dawn.
Key systems¶
- Canvas renderer (WebGPU-capable, C++/WASM) — Figma's browser canvas is a C++ codebase compiled to WebAssembly via Emscripten; the same C++ also compiles natively (x64/arm64) for server-side rendering. Historically a WebGL renderer; as of 2026-04-21 ships a WebGPU backend as a peer via a graphics-API interface layer with encode/submit uniform batching, a custom shader translator (naga-backed for WGSL emission), telemetry-driven device blocklist, and mid-session WebGPU→WebGL fallback. Shares WebGPU implementation with native targets via Dawn. Downstream: RenderServer is the server-side rendering consumer of this same code path.
- systems/figma-ai-search — Figma's AI-powered search feature
(shipped at Config 2024) combining visual search (query by
screenshot / selected frame / sketch) and semantic search (query
by natural text against component names/descriptions/files even when
terminology doesn't match). Originated in a June 2023 three-day AI
hackathon whose most ambitious prototype was design autocomplete;
user research on the prototype revealed 75% of Figma-canvas objects
come from other files — search became the higher-leverage ship.
Indexing policy stacks heuristics (patterns/selective-indexing-heuristics):
top-level frames at common UI dimensions + non-top-level exceptions,
near-duplicate collapsing, file-copy skipping, experimental
ready-for-dev quality signals; plus
patterns/edit-quiescence-indexing (index only after 4h of no
edits — keeps WIP out, sheds load). Eval tool built on Figma's
own infinite canvas + public plugin API + keyboard shortcuts
(patterns/visual-eval-grading-canvas); eval set seeded from
internal-designer interviews + file-browser usage analysis. Product
bar: deliver across similarity
tiers (exact / near-similar / broad) because users start from
close matches and expand outward — "if we couldn't prove we could
find the needle in the haystack, designers wouldn't trust the
feature for broader exploration." Surfaced in Actions panel with
peek previews +
CMD+Enterfull-screen drill-down. Four shipping principles: AI-for-existing-workflows / rapid iteration / systematic quality checks / cross-disciplinary teamwork. Infrastructure (2026-04-21 companion post): embeddings via open-source CLIP (multimodal — text and image into the same space); storage = DynamoDB (metadata + embeddings) + S3 (thumbnails) + OpenSearch k-NN (vector index); inference on SageMaker with batch-size sweet spot. Indexing decomposed into four discrete queued jobs (patterns/pipeline-stage-as-discrete-job) so each stage's batching + retry tunes independently. Enumerating indexable frames runs a headless server-side C++ build of the Figma editor — thumbnail rendering moved from GPU to CPUllvmpipeon newer instances for cost, and from Ruby+JSON to C++ for memory. Edit-quiescence (4h) quantified: cuts to 12% of data processed. Index size halved by excluding drafts, in-file duplicates, and unmodified copies. Query path is hybrid lexical + vector — two OpenSearch indexes queried simultaneously, scores min-max-normalized per index, exact-match boosted, interleaved (patterns/hybrid-lexical-vector-interleaving). concepts/vector-quantization compresses embeddings in-index. Two OpenSearch bugs reported candidly: segment-replication replica non-determinism (fixed upstream in k-NN PR #1808) and_sourceslimming wiping embeddings on updates (patterns/source-field-slimming-with-external-refetch — fix: re-fetch from DynamoDB). - systems/diwydu — Don't Include What You Don't Use: Figma's
libclang-based tool that flags
#includedirectives whose symbols are never directly referenced in the including file. Deliberately laxer than Google's systems/include-what-you-use to make retrofitting a large C++ codebase tractable. Runs in CI on feature branches. - systems/includes-py — pure-Python (no Clang) static transitive-byte counter over the C++ include DAG; runs in CI on every PR; warns on regressions in post-pre-processing byte count per source file. The CI gate that prevents 50-100 build-time regressions per day.
- systems/skew — compile-to-JS language Figma cultivated for its prototype-viewer / mobile codebase (~2014–2024); migrated off to systems/typescript over 2020–2024. Static types, optimizing compiler with devirtualization. Author Evan Wallace later wrote systems/esbuild.
- systems/figma-parameter-runtime — unified 2024–2026 substrate powering Figma's two parameter systems (component properties + variables) with a single typespace, a single binding store (invariant: at most one parameter per bound property), and a shared four-stage runtime (parameter-usage tracking / property-granular invalidation / transitive resolution / update). Unlocked component-property-to-variable binding and a speed-up on variable-mode + variable-value changes; hosts a prospective third parameter system (Figma Sites CMS) for free. Sibling in-memory bidirectional graph to QueryGraph over the same object-tree document model, indexing different edges (parameter-to-bound-property vs read+write deps between nodes).
- systems/figma-materializer — 2026 generic client-runtime framework for maintaining derived subtrees of the document tree via feature-owned blueprints. Replaces the 2016-era Instance Updater (accreted bespoke logic for auto layout / variants / component properties / variables over a decade). Reactivity model: push-based invalidation with automatic dependency tracking — deps recorded implicitly as nodes read data during materialization. Pull-based explicitly rejected because Figma's cross-tree references + deep nesting force reconstructing large dep chains on every read. Shipped clients: component instances (ported), rich text nodes (first net-new feature built on it), slots (open beta April 2026 — composes on top rather than reintroducing bespoke reactivity). Reported canonical impact: variable-mode changes in large files 40–50% faster, "representative of broader gains." Rolled out behind months of side-by-side runtime validation against hundreds of thousands of real files (gate: matched correctness AND matched performance). Sibling-third client reactive graph alongside QueryGraph + Parameter Runtime over the same object-tree document model; the parallel runtime-orchestration unification described in the same post surfaced + eliminated "back-dirties" (concepts/back-dirty), moving client runtime toward unidirectional flow (patterns/runtime-orchestration-unidirectional-flow).
- systems/figma-multiplayer-querygraph — Figma's real-time-collab
server + in-memory bidirectional dependency graph (read + write
deps) over file nodes. Foundational 2019 architecture:
client-server over WebSocket, one server process per
multiplayer document, client-downloads-full-state-on-open with
offline-edit replay on reconnect, object-tree
data model (
Map<ObjectID, Map<Property, Value>>), CRDT-inspired centralized reconciliation (OT explicitly rejected). 2024 QueryGraph extension adds per-session subscribed subsets by reachability; fans edits out server-side only to sessions whose subscription set reaches the edited node. Load-path optimizations: backend preload hint (300–500 ms p75 savings) + parallel decoding via persisted raw offsets (>40% decode-time cut). - Compute platform — ran on ECS on EC2 through early 2023; migrated to EKS (three-active-cluster topology) in <12 months, with majority cutover by Jan 2024. Chose EKS for StatefulSets, Helm, CNCF auto-scaling (systems/karpenter / systems/keda), graceful node-drain, and the service-mesh roadmap. Single-step service-definition via a per-service Bazel config; CI generates Kubernetes YAMLs applied one-step by Figma's in-house deploy system.
- systems/figma-commit-signature-verification — Figma's supply-chain
security system: every Git commit pushed to the internal monorepo
is cryptographically verified to have been S/MIME-signed with a
current device-trust X.509 cert (rotates every 15 days; lives
in the MacBook's macOS Keychain). Built as a GitHub App
(scoped to read code + write commit status checks only —
canonical concepts/least-privileged-access) + an AWS Lambda
behind a Function URL webhook. Credentials held in
Secrets Manager. Posts the
commit-integrity-verificationcommit status that release-branch protection requires to merge. Bot commits pass through an author allowlist + optional diff-heuristics (e.g. fail Dependabot if it touches non-dependency files). - systems/smimesign-figma — Figma's minimally-modified fork of
GitHub's smimesign S/MIME Git signer. Adds
one flag,
--get-figmate-key-id, that walks the macOS Keychain and returns the current device-trust cert key id — the dynamic-lookup primitive that bridges Git's staticuser.signingkeycontract with Figma's 15-day rotating certs. - systems/figma-renderserver — Figma's C++ server-side headless
editor used for thumbnailing and image / SVG export over
user-supplied Figma files. Runs in two different
server-side sandboxes per
use case: full GPU path in nsjail (user +
pid + mount + network namespaces, no network, specific mount
points only, seccomp-bpf — chosen over
Docker as a drop-in to avoid
orchestrated-service rearchitecture), non-GPU path in
seccomp-only after a source-code
refactor that
reorders all
openatcalls before any image processing so a restrictive libseccomp filter lands mid-program. Seccomp-only trade-offs disclosed honestly: easier to test / debug + significantly faster than nsjail, but locks RenderServer into single-threaded with no dynamic font / image loading. Rollout surfaced nsjail's defaultrlimit_fsize=1MBsilently truncating outputs for large-image inputs, plus several seccomp-allowlist iterations as production hit rare codepaths. - systems/figcache — Figma's stateless, horizontally-scalable
RESP-wire-protocol caching proxy sitting between applications
and a fleet of ElastiCache
Redis clusters. Unified Redis data plane:
multiplexes many client
connections onto a small pool of Redis connections (post-rollout:
order-of-magnitude reduction in Redis cluster connection
counts + dramatically less volatile during diurnal traffic);
centralized multi-cluster routing via a dynamically-assembled
engine tree of Router/Redis/Static primitives authored in
Starlark; fanout filter
engine transparently scatters read-only multi-shard pipelines as
parallel scatter-gather (sidesteps Redis Cluster's
CROSSSLOT); Redis Cluster emulation shim makes migration a one-line endpoint config change (patterns/protocol-compatible-drop-in-proxy); uniform metrics/logs/traces per command with workload-ownership classification → incident diagnosis hours/days → minutes. Rolled out H2 2025 for Figma's main API service → six-nines uptime on the caching layer. Built by the Storage Products team. - systems/respc — Go library inside FigCache's frontend providing an RPC framework over RESP. Four components (server layer, streaming RESP protocol parser, schema-driven structured command parser, implementation-agnostic command dispatch). Schema registry declaratively expresses supported command sequences with annotated arguments — the load-bearing piece that converts opaque bytes into semantically-rich typed commands, enabling every downstream guardrail / custom command / fanout resolution FigCache is built on.
- systems/livegraph — Figma's real-time data-fetching service:
GraphQL-like web API over WebSocket, clients subscribe to queries
and receive a JSON tree that stays live. Backs everything
non-document (comments, file lists, team membership,
optimistic UI, FigJam voting) against RDS Postgres
via DBProxy. Learns of changes by tailing
the WAL logical replication
stream (CDC). 2024–2026
"LiveGraph 100x" rebuild: split monolithic one-server design
(mutation-based in-memory cache, single global replication stream)
into three independently-scaled Go services
(patterns/independent-scaling-tiers) — edge (session /
view-query expansion), cache (read-through, sharded by
hash(easy-expr), cuckoo-filter fan-out to edges), invalidator (stateless, sharded like the DB, tails WAL per shard). Core unlocks: LiveGraph traffic is driven by initial reads not live updates (so invalidation-based caching with re-query is viable), and most queries are easy to invalidate from the schema alone (so the invalidator is stateless). Query shapes assign stable IDs to un-parameterized queries; mutations pop (shape, args) tuples via substitution without a live-query table. ~700 shapes total, only 11 "hard" (range/inequality); handled via patterns/nonce-bulk-eviction (cache co-locates byhash(easy-expr), two-layer keys with a nonce; easy-expr invalidation deletes the nonce → all hard-query keys orphaned in one op). Concurrency contract: read-invalidation rendezvous — three rules (same-type coalescing, read-during- invalidation blocking, invalidation-during-read blocking) guarantee no invalidation is ever silently overwritten by a racing stale read. Validated by chaos test + online cache verification + old-vs- new convergence checker. Eliminates the old architecture's five structural failures: excessive fan-in / fan-out, read-update coupling, fragmented caches, deploy thundering herds, large blast radius from transient shard failures. Future: auto-reshard invalidators, non-Postgres sources, in-cache server-side permission evaluation (connects to systems/figma-permissions-dsl). - systems/dbproxy-figma — Figma's Go service between the application layer and PGBouncer that makes horizontal sharding on RDS Postgres possible. Three-stage query engine (parser → logical planner → physical planner), topology library with <1s backwards-compatible updates, single-shard pushdown + scatter-gather for cross-shard queries, deliberately-restricted sharded-query subset (~90% coverage, no cross-colo joins / no joins off the shard key), feature-flagged per-table rollout gating. Load-shedding + request hedging + transaction support scoped to single shards (no atomic cross-shard transactions — product resilient to partial-commit failures). Shipped first horizontally-sharded table September 2023 with 10s partial primary availability, no replica impact.
- RDS Postgres substrate — 2020 baseline: single Postgres on AWS's largest instance. End-of-2022: a dozen vertically-partitioned RDS Postgres databases (table-groups like "Figma files" / "Organizations") with caching + read replicas. ~100× database-stack growth since 2020. 2022 onward: horizontal sharding built on top of the vertical-partitioning substrate, keeping RDS Postgres unmodified and extending it via Postgres views as logical shards + DBProxy as the router. Explicit build-vs-buy rejection of CockroachDB / TiDB / Spanner / Vitess / NoSQL migration on 18-month runway pressure + existing operational expertise; choice flagged for future re-evaluation once runway is bought.
- systems/figma-response-sampling — Figma's in-house
sensitive-data-exposure detection system, shipped as an
async middleware in the
Ruby application server's
afterfilter. Inspects a configurable uniform-random fraction of outbound API responses in both staging and production — the detection-in-depth layer atop PermissionsV2. Phase 1 (Permission Auditor): regex-matches file identifiers in JSON bodies (high-entropy capability tokens) → enqueues async PermissionsV2 re-checks of user × identifier → logs unexpected decisions. Phase 2 (Sensitive Data Analyzer): generalizes to any column taggedbanned_from_clientsby FigTag via anActiveRecordcallback that records loaded sensitive values into request-local storage on sampled requests; theafterfilter compares serialized JSON against the recorded set. Cross-service integration: LiveGraph posts sampled responses to an internal endpoint that funnels into the same analytics warehouse + triage dashboards. Non-blocking on the hot path, rate-limited pipeline, dynamic allowlist for intentional-safe exposures (patterns/dynamic-allowlist-for-safe-exposure). Architectural choice of app-server middleware over an Envoy proxy — the app tier has the authenticated user, the response body, and in-process access to PermissionsV2 in one place. Production findings within days of rollout: over-returned file IDs, legacy paths that bypassed permission checks, long-unused leaking fields, list endpoints missing per-item access verification. Canonical wiki anchor for platform-security mindset applied to application surfaces (sources/2026-04-21-figma-visibility-at-scale-sensitive-data-exposure). - systems/figtag — Figma's internal data-categorization
tool: annotates every database column with a sensitivity
category, stored in a central schema and propagated to the
data warehouse — so column sensitivity is queryable at both
application runtime and offline analytics time. The specific
category
banned_from_clientsis the signal used by Response Sampling's Phase 2 to flag fields that must not appear in API responses under normal circumstances (security identifiers, billing, PII). Canonical instance of field-level sensitivity tagging + the patterns/field-level-sensitivity-tagging pattern (central schema + runtime-queryable + warehouse-propagated + consumed by many enforcement systems without per-system allowlist maintenance). Integration substrate for both application-layer enforcement (ORM callback → request-local storage →afterfilter comparison) and warehouse-layer controls. Internal details not disclosed: authoring UX, full category set, propagation consistency, coverage / drift mechanisms (sources/2026-04-21-figma-visibility-at-scale-sensitive-data-exposure). - systems/figma-permissions-dsl — Figma's in-house
authorization engine (early 2021 onward), replacing a Ruby-
monolith
has_access?function. Three decoupled components: a policy DSL authored in TypeScript + compiled to JSON-serializableExpressionDef(triples composed byand/or/not, field references as"table.column"strings); anApplyEvaluatorimplemented per language (Ruby / TypeScript / Go) against a shared test suite, returningtrue/false/null(indeterminate); aDatabaseLoaderowning data fetching via acontext_pathresource-addressing map. Evaluation uses patterns/deny-overrides-allow + patterns/progressive-data-loading — load dependencies in batches, short-circuit on determinable verdict — "more than halved" total evaluation time. Static-analysis linter in CI catches known-buggy policy shapes at PR time (patterns/policy-static-analysis-in-ci). React front-end debugger + CLI debugger built on the same evaluator. Design inspired by IAM; OPA / Zanzibar / Oso evaluated and rejected.
Key patterns / concepts¶
- concepts/conflict-free-replicated-data-type — CRDT literature (grow-only-set, LWW-register) as the well-studied foundation Multiplayer draws on; Figma relaxes the decentralization overhead because its server is the single authority per document. Canonical "CRDT-inspired but not CRDT-compliant" design.
- concepts/operational-transform — the Google-Docs-era alternative Figma explicitly evaluated and rejected ("unnecessarily complex for our problem space"; quotes the combinatorial-state-explosion critique).
- concepts/object-tree-document-model — DOM-like tree of
objects reducible to
Map<ObjectID, Map<Property, Value>>; Figma's file-schema shape and the substrate QueryGraph's dependency edges index. - concepts/parameter-system — "set-once-apply-across" parametrization primitive with two axes (source-of-truth location: scoped vs global; typespace: unified vs parallel); Figma's component properties and variables are the two canonical instances, Figma Sites CMS is a prospective third.
- concepts/parameter-binding — layer-property ↔ parameter edge with the invariant "at most one parameter per bound property." Unified binding store replaces the prior parallel stores that admitted dual-binding bugs.
- concepts/transitive-parameter-resolution — multi-hop walk through the parameter reference graph including across parameter-system boundaries (variable aliases + component- property-to-variable chains).
- patterns/unified-typespace-consolidation — collapse parallel
type definitions (e.g.
VariableType.BOOLEANvsComponentPropType.BOOLEAN) into a single canonical typespace both subsystems reference. Structural pre-condition for cross-subsystem bindings. - patterns/prototype-before-production — Figma's three-client simulator playground as the research environment where the multiplayer architecture was sifted before any production-code change landed.
- concepts/c-plus-plus-compilation-model — transitive
#includeflatten into a single pre-processed mega-file per TU; compile cost proportional to that mega-file's byte count. - concepts/forward-declaration — declare symbol names without full definitions to break include dependencies.
- concepts/source-map-composition — composing N single-stage source
maps through a multi-stage compilation pipeline so browser breakpoints
set in the first-stage source resolve correctly in the final bundle.
Figma's transpiler pipeline reconstructed this across
.sk → .ts → .js. - patterns/centralized-forward-declarations — one
Fwd.hper directory with all forward declarations needed by other files in the directory; included from every header (but never from source files). Pushes forward-declaration discipline from per-author to per-directory. - patterns/ci-regression-budget-gate — measure a resource cost
(compiled bytes, in Figma's case) in CI, warn/block on PRs that
regress it. Canonically instantiated by
includes.py. - patterns/gradual-transpiler-migration — migrate from source language A to target language B by building a transpiler, checking both in, shifting build output to B, and deleting A last. Figma's Skew → TypeScript migration is the canonical instantiation.
- concepts/content-addressed-caching — Bazel remote cache adopted for local builds (>2 min savings when hits); framed as complementary to bytes-reduction, not a substitute.
- concepts/write-dependency-graph — bidirectional read+write-dep graph over document nodes as the substrate for editor-capable dynamic loading. Explicit FK edges (e.g. instance → component) plus implicit edges (auto layout, frame constraints, cross-page recursive constraint/instance chains). Correctness bar: editing parity — a missing write-dep silently corrupts derived state. Figma's QueryGraph is the canonical instantiation.
- concepts/reachability-based-subscription — session subscription set = transitive closure of the loaded page over read+write deps; edits filtered server-side by reachability; dep-edge mutations implicitly grow collaborators' subscriptions and ship newly-reachable nodes.
- patterns/shadow-validation-dependency-graph — run a derived data structure (like QueryGraph) alongside the live authoritative path for "an extended period," reporting errors whenever the authoritative path edits a node the derived structure didn't predict. Pre-condition for flipping dynamic loading live; surfaced a cross-page recursive write-dep at Figma before production impact.
- patterns/preload-on-request-hint — backend fires a hint to a stateful backend (Multiplayer) on the initial HTTP GET so decoding starts before the client's WebSocket connects; 300–500 ms p75 savings.
- concepts/tight-migration-scope — change only the substrate, keep the abstraction above unchanged; two exceptions (old- behavior-match expense, one-way doors). Figma's ECS→EKS migration principle.
- patterns/scoped-migration-with-fast-follows — tight-scope migration + explicit fast-follow list pipelined after the cutover (Keda pod-autoscaling, Vector log forwarding, Graviton, service mesh, ACK). Figma's explicit deferral-and-pipeline discipline.
- patterns/multi-cluster-active-active-redundancy — three active EKS clusters per environment receiving real traffic; a CoreDNS destruction incident cost 1/3 of requests instead of a full outage.
- patterns/single-source-service-definition — per-service Bazel config → CI-generated YAMLs → one-step deploy; replaced the ECS Terraform-template + separate-deploy two-step.
- patterns/weighted-dns-traffic-shifting — per-service DNS-weight cutovers from ECS to EKS during migration.
- patterns/load-test-at-scale — "Hello World" scaled to largest-service pod count before real workloads; surfaced Kyverno sizing as a new-pod-startup bottleneck.
- patterns/golden-path-with-escapes — opinionated defaults with explicit customization surfaces rather than raw-YAML authoring.
- concepts/device-trust — corporate-managed laptop holds a short-lived X.509 cert (15-day rotation at Figma) in the OS keychain; cryptographically attests hardware origin for any action it signs. The PKI posture that grounds Figma's commit- signing security model.
- concepts/commit-signing — Git's three pluggable signer families (GPG / SSH / S/MIME-X.509). Figma uses the S/MIME path because that's how its device-trust PKI plugs into Git.
- patterns/signed-commit-as-device-attestation — reuse device-trust X.509 certs to sign Git commits, then verify on push so only code originating from a trusted company MacBook can merge. Canonical Figma instance.
- patterns/wrapper-script-arg-injection — tiny shell wrapper
registered as Git's signer program that ignores the args
Git passes and invokes the real signer with dynamic-lookup args
computed at invocation time. Bridges Git's static
user.signingkeycontract with 15-day rotating device-trust certs.user.signingkeyleft deliberately blank. - patterns/webhook-triggered-verifier-lambda — GitHub push webhook → AWS Lambda Function URL → stateless cryptographic verification → GitHub commit status. The verification half of Figma's commit-signature system.
- concepts/connection-multiplexing — decouple upstream Redis connection count from client-fleet elasticity by interposing a proxy tier. Absorbs asymptotic connection-ceiling pressure and thundering-herd new-connection storms during rapid client-fleet scale-ups. FigCache is the canonical Figma instance.
- patterns/caching-proxy-tier — stateless protocol-native proxy fleet in front of a cache fleet. Responsibilities absorbed: connection multiplexing, multi-cluster routing, topology-change absorption, command-semantics-aware guardrails, inline data transformations, uniform observability, cluster-mode emulation. FigCache is the canonical Figma instance.
- patterns/protocol-compatible-drop-in-proxy — proxy speaks the backend's native wire protocol (RESP) + cluster-mode emulation shim → migration is a one-line endpoint config change. The integration pattern that made FigCache's reversible, feature- flag-gated per-workload rollout possible.
- patterns/starlark-configuration-dsl — Starlark program evaluated at init-time in a VM rendering a typed Protobuf config the core server consumes. FigCache uses this to let operators compose engine trees (Router + Redis + Static primitives) in pure config — no server-binary redeploys for routing/guardrail changes. Canonical Figma instance.
- concepts/vertical-partitioning — split groups of related tables onto separate DB instances, each table still whole on its host. Figma's 2020–2022 scaling lever (≈12 vertical partitions by end of 2022). Stepping stone to horizontal sharding — the 1→1 failover machinery operated during vertical partitioning de-risks horizontal sharding's 1→N physical failover.
- concepts/horizontal-sharding — split a single table's rows across multiple physical DBs. Figma's 2022–present effort on top of vertically-partitioned RDS Postgres. Shipped first sharded table September 2023 with 10s partial primary availability.
- concepts/shard-key — Figma picks a small set of shard keys
(
UserID,FileID,OrgID) rather than forcing a single universal key; hash-of-shard-key routing for uniform distribution (trades off range-scan efficiency on the shard key). - concepts/logical-vs-physical-sharding — decouple serve-as- if-sharded application behavior from actual data movement; Figma's central de-risking move. Canonical instance: per-shard Postgres views + feature-flagged DBProxy rollout, seconds-rollback, before the 1→N physical failover.
- concepts/scatter-gather-query — queries without a shard-key predicate fan out to every shard → same load as unsharded → scale cap. Figma's DBProxy deliberately restricts the sharded- query language to avoid worst-case scatter-gather complexity.
- patterns/sharded-views-over-unsharded-db — Figma's logical-shard representation: per-shard Postgres views + per- shard connection poolers over the same unsharded physical instance. <10% worst-case view overhead validated against a production query corpus + shadow-reads framework.
- patterns/shadow-application-readiness — run the logical planner against live production traffic (logged to Snowflake), offline-analyze query plans, pick the supported sharded-query subset covering 90% of queries without worst-case engine complexity. Canonical "API scoping from real traffic."
- patterns/colocation-sharding — group related tables that share a shard key into a "colo" that shares physical layout and supports cross-table joins + full transactions when scoped to a single shard-key value. Narrows the router's scope while preserving relational semantics for the common case.
- concepts/permissions-dsl — Figma built an in-house DSL rather than using OPA / Zanzibar / Oso after evaluating all three; design inspired by IAM policies (effect + action + resource + condition).
- concepts/data-policy-separation —
ApplyEvaluator+DatabaseLoadersplit. Policy authors name fields ("team.permission"), engine owns data fetching. Pre-DSL, entangled ActiveRecord-calls-in-policy made permissions checks ~20% of Figma's database load. - concepts/json-serializable-dsl —
ExpressionDefas plain JSON (triples + and/or) enables 2–3-day new-language evaluators, trivial cross-platform consistency, recursive-walk static analysis, and unlocks the CI linter + debuggers. - concepts/three-valued-logic —
true/false/nullindeterminate evaluator return allows early exit when policy verdict is decidable from partial data. - patterns/expression-def-triples — the triples +
and/or/not boolean-logic DSL shape (also seen in Elasticsearch
/ MongoDB query languages). Figma's
ExpressionDefis the canonical production instance. - patterns/deny-overrides-allow — effect-resolution rule (IAM / Cedar / Figma): any deny policy matching → deny; otherwise any allow matching → allow; default deny.
- patterns/progressive-data-loading — partition the declared dependency set by heuristic (most-commonly-determining first: file / folder / team roles), load a batch, evaluate, exit early if the three-valued verdict is non-null. "More than halved" total evaluation time.
- patterns/policy-static-analysis-in-ci — build-time linter
rejects policy filters matching known-buggy patterns (e.g.
field = refwithout a siblingfield <> null). Explicitly chosen over runtime engine enforcement to preserve evaluator simplicity as the cross-platform invariant. - patterns/policy-proof-of-concept-branch — de-risk a policy- engine rewrite by porting every existing rule to the new model on a throwaway branch, running the entire legacy test suite green. Surfaces both the new model's inadequacies (Figma's first PoC did) and hidden product decisions accumulated over years.
- patterns/consistency-checkers — SQL-based invariant tests comparing expected system state against data recorded across multiple sources of truth on a pre-defined cadence, in both dev and prod; two flavours (data-quality checks validate stored-data-matches-product-state; code-logic checks validate application-behaviour-matches-business-rules); routed structured alerts (rows + metadata) to owning team on violation. Figma Billing's framework, built for the 2025 billing-model refresh, generalised beyond Billing to product security, access/identity management, and growth teams (e.g. connected projects). Canonical wiki instance; Slack Engineering cited as cross- company prior art.
- patterns/data-application-productization — the bespoke- analysis-to-shared-tool arc: when a cross-system derivation repeats across multiple teams, wrap it in a small durable application that encodes the single-source-of-truth interpretation. Figma's Invoice Seat Report (now "one of the most-viewed data applications at Figma", reconstructs each seat-charge narrative for Support / Order Management / enterprise specialists / billing engineers) and consistency checkers themselves both trace this arc.
- concepts/small-map-as-sorted-vec — small, schema-bounded
associative containers represented as flat sorted
Vec<(K, V)>instead of balanced-tree maps. RustBTreeMap→Vec<(u16, u64)>on Multiplayer's per-node property map (average ~60 entries, schema-bounded <200) cut large-file memory ~25% and sped up deserialization despite the asymptotic regression — cache locality dominates at small N. - concepts/tagged-pointer — pack a 16-bit field ID into the
unused top 16 bits of x86-64
u64pointers, collapsingVec<(u16, u64)>toVec<u64>. Benchmarked by Figma's Rust team as a follow-up to the flat-vector rewrite; delivered only ~5% additional RSS vs the simple vector approach (allocated-vs-RSS divergence), and Rust refcount-through- masked-pointer correctness overhead didn't justify shipping — shelved as known-feasible optionality.
Recent articles¶
- 2026-04-21 — sources/2026-04-21-figma-server-side-sandboxing-an-introduction (Security-engineering Part 1 of 3 — the umbrella intro to Figma's sandboxing series. Frames the why before the how: memory-unsafe C/C++ libraries processing user-supplied images / documents / SVGs are hostile-input-on-the-inside and keep shipping memory-corruption CVEs — canonical motivating example is ImageTragick (2016), an ImageMagick RCE that hit every server running it on user-supplied images. Figma's posture: "Buggy software is a fact of life … it's nearly impossible to prevent all vulnerabilities," so sandbox as defence-in-depth, accept compromise will happen, bound the blast radius. Rewriting in memory-safe languages is considered and rejected as a primary strategy ("require pulling resources away from other critical security projects … no methods are foolproof") — additive, not exclusive, to sandboxing. Introduces the three-primitive table (VMs / containers / seccomp) and the four-axis decision questionnaire (environment / security + performance / development cost + friction / maintenance + operational overhead). Client-side sandboxing (WebAssembly) is named as orthogonal. Parts 2 + 3 deep-dive the specific rows. No new wiki pages — this is the intro; all concepts / systems / patterns already exist on-wiki from the two companion ingests.)
-
2026-04-21 — sources/2026-04-21-figma-visibility-at-scale-sensitive-data-exposure (Figma's security-engineering team describes Response Sampling — a two-phase detection system that inspects a configurable fraction of outbound Ruby API responses for sensitive-data exposure, running in both staging and production as the detection-in-depth complement to Figma's preventive authorization stack (PermissionsV2 + negative unit tests + E2E + pentest + bug bounty). Phase 1 (Permission Auditor): Ruby
afterfilter samples at config rate, parses JSON response body, regex-matches file identifiers ("high-entropy capability tokens with a known character set and consistent length"), enqueues async jobs that re-verify user × identifier against PermissionsV2. Phase 2 (Sensitive Data Analyzer, "fancy Response Sampling"): generalizes to any column taggedbanned_from_clientsby FigTag — Figma's internal column-level data classification tool. Precision trick: anActiveRecordcallback on sampled requests records loaded sensitive values into request-local storage; theafterfilter compares the serialized JSON against those exact values (avoids coincidental-match FPs; scopes overhead to sampled requests only). Cross-service: LiveGraph submits sampled responses to the same internal endpoint — shared schema + logging path + triage dashboards across services. Explicit architectural choice: middleware in the Ruby app server, not an Envoy proxy — the app tier has the authenticated user object + full response body + in-process access to PermissionsV2; doing it at a proxy would make user-aware permission evaluation "significantly harder". Non-blocking everywhere: sampling/verification failures never fail the user request; rate-limited pipeline bounds overhead. Dynamic allowlisting (patterns/dynamic-allowlist-for-safe-exposure) handles intentional safe exposures (canonical example: an OAuth client secret legitimately returned by a dedicated credential-management endpoint but a critical finding anywhere else). Production findings within days: file identifiers returned unnecessarily (data-filtering fix), paths that bypassed permission checks entirely (gaps closed), long-unused fields leaking into responses (targeted fix), list endpoints that verified parent access but not per-item access (per-item checks added). Stated meta-frame — platform-security mindset applied at the application layer (patterns/platform-security-at-application-layer): "treating our application surfaces like infrastructure and layering continuous monitoring and detection controls on top. By applying techniques typically reserved for lower-level systems to our application layer, we were able to gain continuous visibility into how data moves through our products, without slowing development." Operational lessons: tune sampling rates + run async for performance; manage FPs with dynamic allowlisting + rigorous triage (concepts/false-positive-management) or they manage you; context matters (not all exposures equally severe — dynamic config tunes without redeploy); layered defense across staging + prod. Introduces systems/figma-response-sampling, systems/figtag; concepts concepts/sensitive-data-exposure, concepts/response-body-sampling, concepts/detection-in-depth, concepts/data-classification-tagging, concepts/false-positive-management; patterns patterns/response-sampling-for-authz, patterns/platform-security-at-application-layer, patterns/async-middleware-inspection, patterns/dynamic-allowlist-for-safe-exposure, patterns/field-level-sensitivity-tagging; extends systems/figma-permissions-dsl (now cited as the target of detection-layer spot-checks, not just the preventive engine) and systems/livegraph (cross-service sampling contributor). Numbers not disclosed: sampling rate, QPS, p50/p99 overhead, TP/FP rate, async-job substrate details, FigTag tagging coverage / propagation latency. Sibling security-engineering posts from the same 2026-04-21 batch: VM sandboxing, containers + seccomp, Santa rollout, device-trust commit signing, PermissionsV2 DSL — all six form Figma's security-engineering narrative at production scale.) -
2026-04-21 — sources/2026-04-21-figma-rendering-powered-by-webgpu (Figma's year-long C++/WASM canvas renderer migration from WebGL to WebGPU while keeping WebGL as a peer backend. Five substrate-level projects: (1) graphics-API abstraction-layer redesign (patterns/graphics-api-interface-layer) — reshape the interface around WebGPU's explicit-state model; fixed latent WebGL bugs by making draw state explicit (concepts/explicit-graphics-state). (2) shader translator pipeline (patterns/shader-source-translator-pipeline) — custom in-house preprocessor (normalizes older WebGL-1 GLSL to a newer dialect +
#includemodularity + input metadata extraction) feeding naga to emit both newer GLSL (for WebGL) and WGSL (for WebGPU). Single shader source, two backends, zero drift. (3) uniform buffer batching — encode/submit split in the graphics interface amortizes WebGPU's per-uniform GPU-memory allocation cost; naïve per-uniform mapping would have regressed performance. Uniform buffers are the WebGPU requirement that forced the redesign. (4) shared C++ across browser + native via Dawn — Emscripten compiles the C++ renderer to Wasm for browsers (migrating to Dawn'semdawnwebgpu), and the same code compiles natively (x64/arm64) for server-side rendering via direct Dawn linkage. One graphics-API surface, two targets. RenderServer / thumbnail- generation path is the named server-side consumer. (5) production rollout with dynamic fallback — telemetry-driven device blocklist (seeded from compatibility probes and expanded from fallback-rate telemetry) plus mid-session WebGPU→WebGL fallback (extends existing context-loss / device-lost handlers to swap backends rather than recreate the same one) closed the rollout loop after mid-session WebGPU failures appeared on Windows. Quirks: sync-vs-async readback — WebGL's syncreadPixelspowered Figma's load-time compatibility probes; WebGPU's async-only readback would have added "hundreds of milliseconds" to startup, forcing a shift to non-load-blocking post-session probing. Outcome: "performance improvement when using WebGPU on some classes of devices, and more neutral results on others, but no regressions." Compute shaders (concepts/compute-shader), MSAA, and RenderBundles are named as future-work wins that WebGL simply could not provide — the actual rationale for the migration. Sibling posts in the 2026-04-21 batch: game-engine framing (the C++/WASM canvas leg this deep-dives) and Rust multiplayer-server memory work (the Rust server leg, a different leg of the same three-language stack).) -
2026-04-21 — sources/2026-04-21-figma-how-figma-draws-inspiration-from-the-gaming-world (Figma engineering framing piece positioning the client architecture as a game-engine stack adapted for the browser rather than a web stack — canonical umbrella post for the 2026-04-21 batch. Three-language split (patterns/game-engine-stack-for-web-canvas): C++ compiled to WebAssembly for the canvas (rendering, object graph, layout, physics — the 2017 migration cut load time 3× per the linked post), React + TypeScript for the UI shell (explicit rationale: "C++ does not come with great libraries for delightful and responsive user interface engineering"), Rust for the multiplayer server ("better developer ergonomics than C++"). Systems-as-building- blocks vocabulary (concepts/game-engine-architecture) borrowed from game engines — every feature lands as a named system (multiplayer, collision, animation, chat/audio, auto-layout, component/variant, widget/plugin) with a defined boundary. The architecturally load-bearing production story is a disclosed cross-subsystem cascade (concepts/interdependent-systems): a six-month-old PR in the layout subsystem corrupted FigJam connector attachment state → connectors oscillated across collaborators → produced "a huge number of multiplayer messages, which overloaded the multiplayer and autosave systems." Code audit of the connector subsystem revealed nothing; debug-message instrumentation three subsystems away found the root cause. Canonical instance of "one part of the code causes a bug in an entirely different part of the codebase" landing on a real-time collaboration server (systems/figma-multiplayer-querygraph extended with the Cross-system failure mode section). Tables-in-FigJam multiplayer UX introduced patterns/observer-vs-actor-animation: render the same edit twice — live feedback tied to the initiating user's mouse (snappy, no animation) plus animation for observing users (smoothed canvas transitions so remote edits don't jump); plus rubber-band drag-limits explicitly analogized to games' invisible walls. Designer prototype (Jakub Świadek) + engineer framework build (Tim Babb). Collaborative-first product constraint explicitly rejects the lock-during-edit fallback. Keyboard-navigation accessibility layer (partnered with Figma's accessibility team) framed as a game-engine-style control-system concern (parallel to game-controller support including the Xbox Adaptive Controller). Introduces concepts/game-engine-architecture, concepts/web-assembly, concepts/interdependent-systems, patterns/game-engine-stack-for-web-canvas, patterns/observer-vs-actor-animation; extends systems/figma-multiplayer-querygraph (new Cross-system failure mode + Game-engine framing sections), systems/react, systems/typescript (stack-split rationale), concepts/object-tree-document-model (substrate for the cross-node implicit coupling). No latency / QPS / memory / cost / rollout numbers disclosed — narrative / retrospective / hiring-adjacent. Sits as the architectural umbrella under which the other 2026-04-21 Figma posts — LiveGraph 100x, FigCache, [[systems/figma- materializer|Materializer]], AI Search infra, Permissions DSL, etc. — all sit as named systems in the game-engine sense.)
-
2026-04-21 — sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma (Infrastructure companion to the earlier "How We Built AI Search" post. Embeddings via open-source CLIP (multimodal text+image, same space → one index serves both query modes). Pipeline = four discrete queued jobs over DynamoDB (metadata + embeddings) + S3 (thumbnails) + SageMaker (batched inference) + OpenSearch k-NN (vector index). Cost optimisations that dominated: Ruby → C++ rewrite of frame enumeration + thumbnailing; GPU → CPU
llvmpipesoftware rendering on newer instances; edit-quiescence (4h) cuts to 12% of data; corpus halved by excluding drafts + in-file dups + unmodified copies; vector quantization (concepts/vector-quantization) shrinks in-memory k-NN. Query = hybrid lexical+vector, min-max normalized per index, exact-match boosted, interleaved (patterns/hybrid-lexical-vector-interleaving). Two candidly- reported OpenSearch bugs: segment-replication replica non- determinism (upstream fix in k-NN PR #1808);_sourceslimming wiped embeddings on update, fix = re-fetch from DynamoDB (patterns/source-field-slimming-with-external-refetch). Scale driver: "small percentage of users onboarded → convergent full- fleet indexing" because teams are small and numerous. Introduces systems/clip-embedding-model, concepts/vector-quantization, patterns/pipeline-stage-as-discrete-job, patterns/hybrid-lexical-vector-interleaving, patterns/source-field-slimming-with-external-refetch; extends systems/figma-ai-search with a full Infrastructure section.) - 2026-04-21 — sources/2026-04-21-figma-supporting-faster-file-load-times-memory-optimizations-rust
(Rust team's server-side memory optimizations on
Multiplayer after
dynamic page loading drove ~30% more server-side file decode
volume. Hot data structure: per-node
Map<u16_property_id, u64_pointer>— memory profiling showed it was >60% of per-file memory despite storing only metadata. Fix: replace Rust'sBTreeMapwith a flat sortedVec<(u16, u64)>. Schema-bounded key domain (<200 fields, average ~60 per node, entries arrive sorted on the wire) makes the vector's asymptotic O(n) insert never trigger on the load path, and cache-locality wins flip the theoretically-worse container into a practically-faster one: ~25% memory drop on large files, deserialization faster despite the Big-O regression. Second experiment — pointer tagging packing 16-bit field IDs into the unused top 16 bits of x86-64 pointers — delivered marginally faster benchmarks + ~5% lower RSS vs simple Vec (not 20% — RSS-vs-allocated divergence, same lesson as Datadog Go 1.24runtime/metrics-vs-/proc/smaps); not productionized because refcount-through-masked-pointer correctness wasn't worth the win. Net fleet outcome from shipping just the flat-vector change: +20% p99 file deserialization performance, ~20% memory-cost reduction across the entire Multiplayer fleet. Companion to the 2024 dynamic-page-loading post — that one cut client memory ~70% and slow-file p99 ~33%; this one fixes the server that feeds the client.) - 2026-04-21 — sources/2026-04-21-figma-redefining-impact-as-a-data-scientist (Figma Billing DS team's impact-framing post with two architecturally-real outputs: consistency checkers — SQL-based invariant tests (patterns/consistency-checkers) in two flavours (data-quality checks validate stored-data-reflects-product-state; code-logic checks validate application-behaviour-matches-business-rules), run in both dev and prod on a pre-defined cadence across unified product-log + billing-state + payment-processing + CRM data, routing structured alerts (rows + metadata) on violation; built for the 2025 billing-model refresh, since "adopted beyond the billing team, powering data-integrity and code-logic checks across product security, access and identity management, and other growth teams" (connected projects the named concrete reuse); Slack Engineering cited as cross-company prior art. Invoice Seat Report — a data application (patterns/data-application-productization) reconstructing seat-charge narratives by pulling product events + contract metadata + billing rules + past state transitions into plain-language explanations; now "one of the most-viewed data applications at Figma," shared across Support / Order Management / enterprise specialists / billing + monetisation engineers. Rest of the article is DS-role / "redefining impact" framing (pie charts traditional-vs-actual work mix) — ingested narrowly on the two-tool architectural substance. No numbers disclosed.)
- 2019-10-11 — sources/2019-figma-how-figmas-multiplayer-technology-works
(Resurfaced 2025-08-16, HN 176. Foundational description of
Multiplayer architecture: client-server over WebSocket,
one server process per multiplayer document as the single
authority that lets Figma simplify CRDTs, client
download-on-open + offline-edit replay on reconnect,
documents-only-in-Multiplayer (comments/users/teams in
Postgres with separate sync system), object-tree document
data model reducible to
Map<ObjectID, Map<Property, Value>>— the substrate QueryGraph's dependency edges later index. Explicit rejection of Operational Transforms ("unnecessarily complex for our problem space"; quotes state-explosion critique). CRDT-inspired but not CRDT-compliant: grow-only-set - LWW-register as building blocks, decentralization overhead stripped because server is authoritative. Methodology: three-client simulator playground prototyped the architecture before any production-code change. Raw capture truncates before the algorithm-details sections.)
- 2024-04-27 — sources/2024-04-27-figma-speeding-up-c-build-times
(C++ cold build times cut ~50%: custom AST tool DIWYDU catches unused
includes,
includes.pyruns in CI as a transitive-byte regression gate,Fwd.hper directory formalizes forward-declaration discipline; 50-100 potential slowdowns prevented daily) - 2024-05-03 — sources/2024-05-03-figma-typescript-migration
(Skew → TypeScript migration via custom transpiler + gradual rollout;
source-map composition across two compile stages; three load-bearing
language-semantic differences — JS array-destructuring perf,
devirtualization divergence, TS init-order — drove targeted transpiler
patches; bundler
defines+ DCE replaced Skew's compile-time conditional compilation) - 2024-05-22 — sources/2024-05-22-figma-dynamic-page-loading (Dynamic page loading extended from viewers to editors via QueryGraph — a bidirectional read+write dependency graph held in-memory by Multiplayer. Per-session subscribed subset = transitive closure of the initial page; edits fan out to collaborators only where reachable. Shadow-mode validation for "an extended period" flushed out a cross-page recursive write-dep. Server-side decoding (now critical path) paid for by backend preload hint (300–500 ms p75) + parallel decoding via persisted raw offsets (>40% decode-time cut). Client lazily materializes instance sublayers touching dozens of subsystems. Six-month A/B rollout: 33% speed-up on slowest loads despite files +18% YoY, 70% fewer nodes in client memory, 33% fewer out-of-memory errors)
- 2026-04-21 — sources/2026-04-21-figma-parameter-architecture-unification (Retrospective on unifying Figma's two parameter systems — component properties (2022, scoped) + variables (2023, global) — at the data-model and runtime levels. Parallel type definitions and parallel binding-storage had produced a user-visible dual-binding bug, doubled cost per new type, and a ceiling against cross-system bindings. Fix consolidates to a single typespace + single binding store (invariant: at most one parameter per bound property) + single four-stage runtime pipeline (tracking / invalidation / transitive resolution / update). Unlocked component-property-to-variable binding plus a speed-up on variable-mode and variable-value changes attributed to stricter property-granular invalidation; makes the upcoming Figma Sites CMS a free third parameter system on the same substrate. Sibling in-memory reactive graph to QueryGraph — both bidirectional indexes over the same object-tree document model, indexing different edge types.)
- 2026-04-21 — sources/2026-04-21-figma-enforcing-device-trust-on-code-changes
(Figma security team enforces device-trust on every Git
commit merging to release branches. Every company MacBook
holds a 15-day-rotating X.509 device-trust cert in the
macOS Keychain; commits are S/MIME-signed
with it via a Figma-modified smimesign
fork (adds
--get-figmate-key-id) called through a one-line wrapper bash script that ignores Git's staticuser.signingkeyand substitutes a Keychain lookup. Verification is a GitHub App (read code+write commit statusonly, canonical concepts/least-privileged-access) + an AWS Lambda behind a Function URL webhook; credentials in Secrets Manager; verification uses smimesign/ietf-cms against Figma's internal CA; postscommit-integrity-verificationstatus that branch protection requires to merge — canonical patterns/webhook-triggered-verifier-lambda. Bot commits (Dependabot + other external Apps, signed with GitHub's web-flow GPG key) pass via an author allowlist with optional diff heuristics (Dependabot touching non-dependency paths → fail). Engineer experience is just a green status check — no extra toil. Introduces concepts/device-trust, concepts/commit-signing, patterns/signed-commit-as-device-attestation, patterns/wrapper-script-arg-injection, patterns/webhook-triggered-verifier-lambda; systems systems/smimesign, systems/smimesign-figma, systems/figma-commit-signature-verification, systems/github-apps.) - 2026-04-21 — sources/2026-04-21-figma-rolling-out-santa-without-freezing-productivity
(Figma Endpoint Security team rolled out Santa —
the Google-originated macOS binary
authorization tool — to 100% of laptops over ~3 months without
freezing productivity. Four load-bearing design decisions: (1)
monitoring-mode
first — run Santa in passive mode, mine
UNKNOWN-event telemetry to build SigningID + TeamID-dominated allowlist from real fleet execution before any block; (2) self-service Slack approval — block event → sync-server malware check (ReversingLabs + risk signals) → Slack app on Figma-managed devices offers approve/ignore/flag-as-malware; approval creates machine-specific rule (not fleet); >90% of steady-state blocks self-resolve; MDM-triggeredsantactl synccuts enforcement latency 60s → 3s; (3) Package Rule auto-generation — config-as-code{package_type: homebrew, package: vim}→ 30-min workflow on macOS runners fetches current SHA-256 from official source → ~200 Package rules expand to ~80,000 Binary rules; (4) cohort percentage rollout 10% → 25% → 50% → 70% → 98% → 100% with per-cohort inclusion criteria, final engineers/data-scientists 30% held for group-scoped permissive rules addressing Anaconda ad-hoccodesignper-machine-unique hashes;/santa disableescape hatch reverts single machine to monitoring mode during rollout, retired at 100%. Sync server is a fork of Airbnb's Rudolph. FAA (file access authz — locking browser cookies to the browser binary) shipped first as zero-workflow-impact win before lockdown. 80K-rule initial sync timeout mitigation: static allowlist for MDM/Chrome/Slack/Zoom. Steady state: ~150 allowlist + ~50 blocklist + ~80K Package-generated + ~50 PathRegex - ~10 Compiler + median 3 personal per user; P95 3–4 blocks per
user per week. Rule-type trade-off noted (SigningID precision vs
TeamID breadth, LogMeIn Team-ID example); known limitation on
Compiler rules with
go runrace condition; TCC-permission regression defence via separate osquery-based auto-unset. Introduces systems/santa, systems/rudolph, concepts/binary-authorization, patterns/data-driven-allowlist-monitoring-mode, patterns/self-service-block-approval, patterns/package-rule-auto-generation, patterns/cohort-percentage-rollout, patterns/rollout-escape-hatch, patterns/static-allowlist-for-critical-rules.) - 2026-04-21 — sources/2026-04-21-figma-server-side-sandboxing-virtual-machines (Security-engineering Part 2 of 3 — how Figma thinks about server-side sandboxing (a.k.a. workload isolation) via virtual machines, and why their production instance is AWS Lambda backed by Firecracker micro-VMs. Core frame: a sandbox must answer two questions, not one — can the sandbox escape (hypervisor boundary), and if it can't, what can a compromise do with the VM's own capabilities (network egress + IAM + credential + VM lifetime)? The latter is the pattern patterns/minimize-vm-permissions. Figma's link-metadata fetcher (FigJam link previews)
- canvas image fetcher run ImageMagick on third-party URLs inside Lambda; they sit outside the Figma production VPC with no IAM pivot into internal services, so an ImageMagick or fetch-logic exploit grants no path to Figma internals. Latency/isolation trade-off surfaced at the tenant level: AWS reuses a tenant's Firecracker VM across requests because "Firecracker offers reasonably quick VM boot times, but the overheads are still too high to pay on many core workflows." Figma accepts this as a reasonable trade for their use case. Gotchas called out: localhost Lambda runtime API is an SSRF hazard (leaks triggering request + accepts forged responses) — Figma blocks application code from making localhost requests; Lambda isn't "raw compute" (easy to over-privilege via VPC placement or IAM); reserved concurrency is a shared account+region quota. Only quantitative datapoint: first un-warmed call took up to 10 seconds before tuning. Also argues VMs are the heavyweight primitive (trade-off profile: ✅ compatibility / full-OS workloads ; ❌ cold-start and orchestration cost ; ⚖ debugging fine, cluster ops hard) and names the four sandbox-choice axes — environment, security + performance, development friction, operational overhead. Entity pages pre-existed from prior partial work and already anchor to this source; this pass completes source-page plumbing + company + index + log cross-referencing. Sibling posts: Part 1 intro not yet ingested; Part 3 containers + seccomp now ingested — see sources/2026-04-21-figma-server-side-sandboxing-containers-and-seccomp.)
- 2026-04-21 — sources/2026-04-21-figma-server-side-sandboxing-containers-and-seccomp
(Security-engineering Part 3 of 3 — containers + seccomp rows
of the sandboxing primitive
table. Frames container escape along three axes: kernel
vulnerability (Dirty COW / Dirty Pipe family), runtime
implementation bug (systems/runc / systems/docker
internals), runtime configuration (operator choices — the
axis VMs don't expose). See concepts/container-escape.
Introduces seccomp as the narrowest
isolation primitive — a syscall allowlist the process runs
under — used at scale by Android / Chrome / Firefox; key
limitation is that it can't dereference pointer arguments
so it can't filter
openatby path. Production exemplar RenderServer (C++ headless Figma editor for thumbnailing / SVG export) runs in two sandboxes by path: full GPU path in nsjail (user + pid + mount + network namespaces, no network, specific mounts only, seccomp-bpf — chosen over Docker as a drop-in alternative that avoided building an orchestrated service); non-GPU path in seccomp-only after a source-code refactor that reorders all file opens before any image processing, letting a restrictive libseccomp filter land mid-program. Seccomp-only trade-offs disclosed honestly: ✅ easier to test/debug ; ✅ significantly faster than nsjail ; ❌ locks RenderServer into single-threaded ; ❌ cannot dynamically load fonts or images later in runtime. Figma's disclosed allowlist: write to already-open fds, exit, memory allocation, current time. Rollout foot-guns: nsjail defaultrlimit_fsize = 1 MBsilently truncated outputs for large-image inputs; seccomp allowlist needed several iterations as rare codepaths hit in production weren't exercised in testing (kernel logs only name the failing syscall, no other context). gVisor named as the middle-option technology that reduces container-attack-surface by interposing a user-space reimplemented kernel between the container and the host kernel. Introduces concepts/container-escape, concepts/seccomp, concepts/syscall-allowlist, concepts/linux-namespaces, concepts/kernel-attack-surface; systems systems/nsjail, systems/firejail, systems/docker, systems/runc, systems/gvisor, systems/figma-renderserver; patterns patterns/refactor-for-seccomp-filter, patterns/seccomp-bpf-container-composition.) - 2026-04-21 — sources/2026-04-21-figma-figcache-next-generation-data-caching-platform
(Storage Products team built FigCache —
a stateless, horizontally-scalable, RESP-wire-protocol proxy
between Figma's applications and a fleet of
ElastiCache Redis
clusters — plus first-party client wrappers in Go / Ruby /
TypeScript. Rolled out H2 2025 for Figma's main API service →
six-nines uptime on the caching layer. Architecture: ResPC
(systems/respc) streaming RESP parser + schema-driven
structured command parser + implementation-agnostic dispatch in
the frontend; dynamically-assembled engine tree of data
engines (Redis) and filter engines (Router / Static / fanout)
as the backend; entire engine tree expressed in Starlark
(patterns/starlark-configuration-dsl) evaluated at init-time
to render a typed Protobuf config. Core win:
connection multiplexing
decouples Redis connection load from client-fleet elasticity —
order-of-magnitude reduction in Redis cluster connection
counts post-rollout + thundering-herd scale-up class eliminated.
Protocol-compatible
drop-in proxy: cluster-mode emulation shim + interface-
compatible client wrappers → migration is a one-line endpoint
config change, gated reversibly by feature flags. Fanout filter
engine transparently resolves read-only multi-shard pipelines as
parallel scatter-gather (sidesteps
CROSSSLOT). Uniform metrics/logs/traces per command with workload-ownership classification → incident diagnosis hours/days → minutes; formal caching-platform SLO now possible; ElastiCache operational events (node failovers, cluster scaling, transient errors) downgraded to zero-downtime background ops — shard failovers now run liberally and frequently as live resiliency exercises. Latency risk controls: weekly production stress test at ≥10× organic peak, zonal traffic colocation (patterns/zone-affinity-routing), per-PR CI CPU/mem profile - synthetic-benchmark gates against golden baseline. Build-vs- buy rationale: OSS Redis proxies shipped "rudimentary RPC servers" lacking structured argument extraction → blocked semantics-aware guardrails + custom commands; forks "difficult to keep in sync with upstream." No end-to-end latency numbers, throughput numbers, or cost breakdown disclosed. Introduces systems/figcache, systems/respc, concepts/connection-multiplexing, patterns/caching-proxy-tier, patterns/protocol-compatible-drop-in-proxy, patterns/starlark-configuration-dsl; extends systems/redis, systems/aws-elasticache, concepts/control-plane-data-plane-separation, patterns/zone-affinity-routing.)
- 2026-04-21 — sources/2026-04-21-figma-how-we-built-a-custom-permissions-dsl
(Figma's engineering team rebuilt authorization from a Ruby-
monolith
has_access?function into a custom cross-platform declarative permissions DSL (systems/figma-permissions-dsl) starting early 2021. Four named forcing functions: (a) monolithichas_access?"a bug could leak access to every single file"; (b) hierarchical integer permission levels + boolean escape-flags produced a non-hierarchical matrix pretending to be hierarchical; (c) permissions checks were ~20% of the database load becauseActiveRecordcalls and policy logic were entangled in one function (concepts/data-policy-separation); (d) cross-platform drift between Sinatra (Ruby) and LiveGraph (TypeScript) permissions code was a chronic bug source. Design inspired by IAM policies (effect + action + resource + condition) but OPA, Zanzibar, and Oso were all evaluated and rejected. First PoC was a RubyAccessControlPolicywith an imperativeapply?— ported every existing rule onto a green-CI branch ([[patterns/policy-proof-of- concept-branch]]), which surfaced thatattached_throughwas clumsy and cross-platform AST parsing ofapply?was unreliable. Pivot: JSON-serializableExpressionDefof the shape[field, op, value|ref]composed byand/or/not(patterns/expression-def-triples, concepts/json-serializable-dsl), authored in TypeScript with types/enums/composable helpers, compiling to plain JSON; three evaluator implementations (Ruby / TypeScript / Go) under a shared test suite; separateDatabaseLoaderowning data fetching; acontext_pathmap resolving which rows to query given the input(resource, user). Evaluation: [[patterns/deny-overrides- allow]] + patterns/progressive-data-loading using concepts/three-valued-logic (true/false/nullindeterminate) — load dependency batches in heuristic order, short-circuit as soon as the result is determined. Reported result: "more than halved the total execution time of our permissions evaluation." Additional ecosystem built on the simple evaluator: React front-end debugger rendering per-node truth + data in an expandable boolean tree, CLI debugger with the same output, and CI linter that walks everyExpressionDefand flags e.g.field = refcomparisons without a sibling<> nullguard (explicitly chosen over runtime engine enforcement to preserve evaluator simplicity as a cross-platform invariant). Stated outcome: "we all but eliminated incidents and bugs caused by drifts in the logic between our Ruby and LiveGraph codebase." No numbers disclosed for policy count, evaluations/sec, latency distribution, or post-DSL database-load share.) - 2026-04-21 — sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale
(Figma's Databases team retrospective on scaling RDS Postgres
~100× since 2020. 2020: single Postgres on AWS's largest
instance. End of 2022: ≈12
vertically-partitioned RDS Postgres instances + caching + read
replicas. Vertical partitioning exhausted by (a) vacuum
reliability impact on several-TB tables, (b) per-instance
RDS IOPS ceiling approaching on highest-write tables, (c)
CPU on hottest partitions. Late 2022: start in-house
horizontal sharding. Build-vs-
buy: explicit rejection of CockroachDB / TiDB / Spanner / Vitess
/ NoSQL on 18-month runway pressure + deep RDS operational
expertise + complex relational model. Architecture:
colos (tables sharing a
shard key —
UserID,FileID,OrgID— grouped with shared physical layout; cross-table joins + full transactions work when scoped to a single shard-key value), hash-of-shard-key routing (Snowflake-prefixed IDs would hotspot; hash trades range-scan efficiency for uniform distribution), logical sharding decoupled from physical sharding via per-shard Postgres views over a single unsharded instance (<10% worst-case view overhead; percentage rollout feature-flag-gated; seconds- rollback); DBProxy as the Go router between application + PGBouncer (query parser / logical planner / physical planner / scatter-gather / load-shedding / request hedging / transaction support scoped to single shards); shadow application readiness runs the logical planner against live production traffic to pick a sharded-query subset covering 90% of queries without worst-case scatter- gather engine complexity (all range-scans + point-queries allowed; joins only within the same colo on the shard key); full (not filtered) logical replication during reshards. First horizontally-sharded table September 2023: 10 seconds partial primary availability, no replica impact, no latency/availability regressions. 9 months end-to-end for that first table. Open work: horizontally-sharded schema updates, globally-unique ID generation for sharded PKs, atomic cross- shard transactions for business-critical paths, distributed globally-unique indexes, ORM compatible with horizontal sharding, fully-automated one-click reshards. Explicit future- scope: re-evaluate in-house RDS horizontal sharding vs NewSQL / managed alternatives once runway is bought — the choice was shaped by the deadline, not by long-term preference. No latency / throughput / cost / shard-count numbers disclosed.) - 2026-04-21 — sources/2026-04-21-figma-how-we-built-ai-powered-search-in-figma
(Figma's AI-powered search
(shipped at Config 2024) combining visual search (query by
screenshot / selected frame / sketch — reverse-image-search
lineage) and semantic search (natural text against component
names/descriptions/files even when terminology doesn't match).
Origin story: June 2023 three-day AI hackathon produced 20
projects including a working design autocomplete prototype.
RAG argument framed search as the prerequisite
("we can improve AI outputs with examples from search"). User
research on the autocomplete prototype revealed 75% of
Figma-canvas objects come from other files — search became
the higher-leverage ship. Three product use cases: frame lookup
(exact), frame variations (near-similar), broad inspiration
(diverse). Indexing policy:
patterns/selective-indexing-heuristics stacks (a) UI-shape
dimensions filter (top-level frames that look like UI), (b)
non-top-level exception for frames meeting the right conditions,
(c) near-duplicate collapsing, (d) file-copy skipping, plus
experimental ready-for-development quality signals — framed as
"we couldn't index everything, it would be too costly." Plus
[[patterns/edit-quiescence-indexing|4h no-edit quiescence
window]] before indexing (WIP exclusion + load reduction).
Quality bar: deliver
across similarity tiers simultaneously because users start
from close matches — "if we couldn't prove we could find the
needle in the haystack, designers wouldn't trust the feature
for broader exploration." Eval tool built on Figma's own
public plugin API + infinite canvas + keyboard shortcuts
for rapid correct/incorrect marking + historical run-to-run
comparison (patterns/visual-eval-grading-canvas); eval set
seeded from internal-designer interviews + file-browser usage
analysis. Surfaced in Actions panel (narrower width) with
peek previews +
CMD+Enterfull-screen drill-down; "rabbit holing" deeper-dive interaction explored and scrapped for simplicity. Shipping principles: AI-for-existing-workflows / rapid iteration / systematic quality checks / cross-disciplinary teamwork. Future work: bring to Figma Community; design autocomplete ship. No architecture / infra / embedding-model / vector-store / latency / NDCG numbers disclosed — product-led post, not systems-led. Introduces systems/figma-ai-search, patterns/selective-indexing-heuristics, patterns/edit-quiescence-indexing, patterns/visual-eval-grading-canvas, concepts/similarity-tier-retrieval; extends concepts/vector-embedding, concepts/vector-similarity-search, concepts/relevance-labeling, patterns/hackathon-to-platform, patterns/prototype-before-production.) - 2024-08-08 — sources/2024-08-08-figma-migrated-onto-k8s-in-less-than-12-months (Core compute platform migrated from AWS ECS to EKS in under 12 months, Q1 2023 plan → majority-cutover Jan 2024. Principles: tight migration scope (swap substrate, preserve abstraction) + explicit fast-follows (Keda pod-autoscaling, Vector log forwarding, Graviton, service mesh) + three-active-cluster blast-radius reduction topology + single-step Bazel-config service-definition replacing Terraform-template + deploy two-step + load-test-at-scale "Hello World" at largest-service pod count + weighted-DNS service-by-service traffic cutovers + golden-path-with-escapes UX. Post-migration tooling-UX regression from 3-cluster + RBAC addressed by auto-inferring cluster+role. Enabling condition: Figma's small service count — "not a microservices company." CoreDNS destruction incident on one cluster cost 1/3 of requests instead of full outage.)
- 2026-04-21 — sources/2026-04-21-figma-rebuilt-foundations-of-component-instances (Year-long client-architecture rewrite replacing the 2016-era Instance Updater with Materializer — a generic framework for maintaining derived subtrees of the document tree from feature-owned blueprints. Component instances become one blueprint; rich text nodes is the first net-new feature built on it; slots (open beta April 2026) composes on top rather than re-implementing reactivity. Reactivity model: explicit choice of push-based invalidation + automatic dependency tracking; pull-based rejected because cross-tree references + deep nesting force reconstructing dep chains on every read. Parallel runtime-orchestration unification unified layout / variable / instance / constraint subsystems under a common framework in predictable execution order, surfacing hidden feedback loops Figma calls "back-dirties"; making them explicit let many be eliminated, moving the client toward unidirectional flow (patterns/runtime-orchestration-unidirectional-flow). Rolled out behind months of side-by-side runtime validation against hundreds of thousands of real production files — compared data model + rendered output + performance, gate: both correctness and performance matched before flip. Canonical reported impact: variable-mode changes in large files 40–50% faster, "representative of broader gains." Perhaps biggest return framed as developer velocity: rich text + slots + other in-progress features ship on the shared framework instead of each reimplementing reactivity. Third Figma client reactive graph over the same object-tree document model after QueryGraph (node deps) and Parameter Runtime (parameter-to-bound-property edges) — Materializer indexes source-of-truth → derived-subtree edges with automatic dep tracking.)
-
2026-04-21 — sources/2026-04-21-figma-keeping-it-100x-with-real-time-data-at-scale (LiveGraph 100x: re-architecture of Figma's real-time GraphQL-like data-fetching service to absorb ~100× growth in sessions + DB-update volume. Sessions tripled since 2021; view requests 5× in the last year. Five named structural failures of the old one-server-with-in-memory-mutation-based-cache design: excessive fan-out, excessive fan-in, tight coupling of reads + updates, fragmented caches, large blast radius from transient shard failures (global-order assumption meant one slow shard stalled all optimistic updates across the product). Two insights unlocked the new design: (1) most LiveGraph traffic is initial reads, not live updates — so invalidation-based caching (re-query on change) is viable; (2) given Figma's schema, most queries are easy to invalidate from the mutation alone — so the invalidator can be stateless. New architecture = three independently-scaling Go services (patterns/independent-scaling-tiers): edge (sessions / view-query expansion / cache subscription / refetch on invalidation), cache (read-through, sharded by
hash(easy-expr), cuckoo-filter fan-out of invalidations to edges, hot replicas on standby, deploy decoupled from edge deploy — eliminates thundering herd class), invalidator (sharded like physical DB, tails WAL logical replication stream per shard as CDC, no per-query state). Query shapes: un-parameterized queries have stable IDs; a live query =(shape_id, args); mutations substitute column values into shapes to pop affected queries mechanically. ~700 shapes, only 11 "hard" (range / inequality predicates — potentially infinite fan-out); handled via patterns/nonce-bulk-eviction (co-locate all hard queries with the same easy-expr on one cache shard viahash(easy-expr)sharding; two-layer keys{easy-expr}→nonce+{easy-expr}-{nonce}-{hard-expr}→results; invalidate by deleting the nonce → all hard-query keys orphaned atomically; TTL reaps orphans; edge re-queries only active session hard-queries). Schema discipline: all queries normalize to(easy-expr) AND (hard-expr). Concurrency correctness via read-invalidation rendezvous: (1) same-type coalescing of reads/invalidations, (2) inflight reads interrupted by invalidation must not allow new readers to coalesce onto the stale result, (3) inflight invalidations block racing reads from setting the cache. Validated via chaos test + online cache verification (random-sample cache vs primary DB) + convergence checker against old engine (old engine often seconds slower — required fine-grained tuning). Migration targets the least-scalable cache tier first; rollout mechanics deferred to Braden Walker's Systems@Scale talk (not ingested). No numbers disclosed beyond shape-count (700 / 11) and growth multiples (3× sessions / 5× views / 100× target). Future projects named: automatic invalidator re-sharding, non- Postgres source resolution in the cache, first-class server-side computation like permission evaluation in the cache (crosses paths with systems/figma-permissions-dsl). Introduces systems/livegraph, concepts/invalidation-based-cache, concepts/query-shape, concepts/read-invalidation-rendezvous, concepts/thundering-herd, patterns/stateless-invalidator, patterns/nonce-bulk-eviction, patterns/independent-scaling-tiers; extends systems/dbproxy-figma, systems/postgresql, systems/figma-multiplayer-querygraph (sibling real-time system over a very different data shape), concepts/push-based-invalidation (server-tier instance at DB-row granularity), concepts/change-data-capture (WAL-driven CDC consumer), concepts/wal-write-ahead-logging (Postgres logical replication tap).) -
2026-04-21 — sources/2026-04-21-figma-the-search-for-speed-in-figma (performance retrospective on Figma's traditional full-text search after migrating from Elasticsearch to managed OpenSearch in late 2023. Three months of debugging yielded ~60% API-latency reduction, ≥50% max-QPS headroom, >50% cost cut. Key findings: (1) the DataDog "average search = 8 ms" was per-shard, not per-query — with up to ~500 per-shard queries fanning out per user query, coordinator-view latency was actually 150 ms avg / 200–400 ms p99 (canonical concepts/metric-granularity-mismatch; fix = publish the
tookresponse-body field as a custom metric). (2) Pre-/post-processing (permissions filter build + per-result permission re-check) was >70% of total time; Ruby runtime-type- safety checks in the permissions path were a real cost. (3) Thread-local DB connection-pool starvation was eating tens of ms per query across all of Figma — fix unlocked previously-abandoned parallel-DB-read experiments retroactively. (4) Index data was bloated, not queries: trim 50% then additional 90% of unused fields with no relevancy impact; the win was fitting the live set in the OS disk cache (concepts/cache-locality at the page- cache tier). (5) 450 → 180 shards (−60%) increased max QPS ≥50% and decreased P50 — documented as patterns/fewer-larger-shards-for-latency; AWS's log-workload sizing guidance doesn't fit latency-sensitive document search with effective pre-filters. (6) Node mix swap: 1/3 CPU + 25% more RAM at ≈1/2 price — CPU was idle, RAM was the constraint. (7)opensearch-benchmarkwas unusable (vendor-regression-testing tool, client-side latency only); a custom Go harness written in an afternoon produced consistent server-side-tookmeasurements and drove the shard sweep (patterns/custom-benchmarking-harness). (8) Neutral-to- negative: zstd compression was a wash; concurrent segment search added latency even at low QPS. Sibling to the 2026-04-21 AI- search post — same OpenSearch substrate, different query shape.)