Skip to content

FIGMA 2026-04-21 Tier 3-equivalent

Read original ↗

Supporting Faster File Load Times with Memory Optimizations in Rust

Summary

Figma's Multiplayer server loads each collaborative file into memory to propagate edits across a complex node graph. After the 2024 dynamic page loading rollout (sources/2024-05-22-figma-dynamic-page-loading), the number of files the Multiplayer service needs to decode server-side grew ~30%. To absorb the new load, the Rust team attacked memory usage on the hottest data structure: the node property map (Map<u16, pointer_u64> — keys are property IDs, values are pointers to property-value blobs), which memory profiling showed accounted for >60% of a file's memory usage — a surprising finding given the map stores only metadata, not payload. Replacing the default Rust BTreeMap with a flat sorted Vec<(u16, pointer)> cut deserialization time (the load critical path) and dropped memory ~25% on large files. A second experiment tagged the unused top 16 bits of x86 pointers with the field ID (canonical concepts/tagged-pointer) for a 20% theoretical-density win; shipped to a benchmark but not productionized because the marginal ~5% RSS improvement wasn't worth the pointer-lifetime-correctness cost. Net fleet outcome: 20% improvement in p99 file deserialization latency and ~20% memory-cost reduction across the multiplayer fleet.

Key takeaways

  1. Profile memory before optimizing — metadata can dominate payload. The node property map (mapping u16 property IDs to u64 pointers) was >60% of per-file memory usage despite storing only metadata. "Given that the map just stores metadata and not data, we were pretty surprised by this finding." Without this data point, the optimization target would have been the property values themselves, not the property index — a canonical instance of measurement-driven micro-optimization at the memory-profile substrate.

  2. Flat sorted vectors beat balanced-tree maps at small N on modern CPUs. Rust's default BTreeMap has O(log n) lookup/insert; a sorted Vec<(key, value)> has O(log n) binary-search lookup + O(n) insert — theoretically worse. But for the property map's measured average of ~60 keys drawn from a < 200-key schema domain, cache-friendly linear memory access on contiguous entries dominates asymptotic order (concepts/cache-locality). Deserialization (the hot path) got faster, not slower — despite the asymptotic regression — because the vector's entire working set fits in a few cache lines. "Computers are especially fast when they have to traverse and perform computation on small amounts of linear memory (the exact setup here)."

  3. Keyspace constraints come from product shape, not theoretical bounds. New property IDs are added only when the file schema evolves — "at human speeds, i.e. extremely slowly." Schema has fewer than 200 fields; most properties cluster by node type (comment fields only appear on comment nodes, etc.). Measured average per-node map size: ~60 keys. This observed distribution — not the map's theoretical capacity — is what justifies the flat-vector choice. The bound that matters is the empirical one.

  4. x86 pointers carry 16 free bits because of the 48-bit address space — exploit them for data packing. Pointers are 64 bits but x86 processors address only 2^48 bytes; the top 16 bits are architecturally unused and "free for us to use." (concepts/tagged-pointer — pointer tagging.) Figma's Map<u16, pointer> happens to need exactly 16 bits for the field ID, so both can be packed into a single u64 with bit operations: Vec<u64> where each entry is [field_id_u16 | pointer_u48]. Fetch with mask-and-shift. Conceptually equivalent to V8 SMI tagging, OCaml boxed/unboxed representation, Objective-C tagged pointers, JavaScriptCore NaN-boxing.

  5. The memory metric that matters is RSS, not allocated bytes. "The memory metric we really care about is resident set size (RSS), which doesn't always correspond directly with allocated memory and depends on the behavior of the underlying allocator and the OS." Pointer-tagging naively should save 20% allocated memory (collapse two u64s into one) but delivered only ~5% RSS — the allocator's free-list shape and page-commit patterns broke the arithmetic. Same lesson as Datadog's runtime/metrics vs /proc/smaps divergence: virtual and resident diverge silently, and the OS cares about resident.

  6. Not productionizing is a valid outcome of a profitable experiment. The pointer-tagging rewrite produced "marginally faster performance on the benchmark, and slightly lower memory use (about 5% less than the simple vector approach)." But Rust's ownership model forces careful management of reference-counted-pointer lifecycle — insert / get must increment / decrement refcounts correctly, and pointer-tagging obscures what is and isn't a valid pointer from the type system. Decision: "the extra win didn't seem worth the potential memory pitfalls we were opening ourselves up to, but it's always a lever we have in the future." The work remains on the shelf as known-feasible optionality.

  7. Reported fleet-wide outcomes (after deploying the flat-vector change, not the pointer-tagging change): +20% file deserialization performance at p99 ("the slowest Figma files now load faster") + ~20% memory-cost reduction across the entire multiplayer fleet. Server-side file load is the critical path for opening any file; this propagates directly to user-visible file-open latency. Pairs naturally with the 2024 dynamic page loading post (which cut client memory ~70% and p99 slow-file load ~33%) — the 2024 post fixed the client, this one fixes the server that feeds the client.

Architectural details

Property map — the hot data structure

A Figma file is a collection of nodes (triangle / square / frame / …) on Multiplayer's object-tree document model. Each node is a bag of properties — color, shape, parent, x/y/w/h — addressed by property ID. Server-side representation:

Map<property_name: u16, property_value: u64_pointer>

This map is in the deserialization critical path, so any speedup propagates directly to file-load time. Memory profiling flagged it at >60% of total file memory — higher than any single property-value class.

Step 1: BTreeMap → flat sorted Vec<(u16, u64)>

Baseline (Rust BTreeMap<u16, pointer>):

BTreeMap<u16, pointer> {
    0: 1,       // node_id
    1: FRAME,   // type
    2: null,    // parent_id
    ...         // x, y, w, h, etc.
}

Optimized (sorted Vec<(u16, pointer)>):

Vec<(u16, pointer)> {
    (0, 1),
    (1, FRAME),
    (2, null),
    ...
}

Big-O comparison (post admits the vector is "strictly worse" theoretically):

Operation BTreeMap Vec<(k, v)>
Lookup O(log n) O(log n) (binary search)
Insert O(log n) O(n) (worst case — shift)
Iterate O(n) O(n) (but faster constant)
Memory/entry tree-node overhead 10 bytes (u16 + u64)

Why the "worse" data structure wins at the observed working-set size (average ~60 entries, max ≪ 200):

  • No per-node allocation / pointer chasing — the whole map is one contiguous allocation.
  • Cache-line-scale traversal. A 64-entry vector of (u16, u64) is ~640 bytes — fits in ~10 cache lines, often prefetched as a block.
  • Deserialization already sees entries in sorted order on the wire (the serialization protocol guarantees sorted iteration), so insertion is appending, not shifting — the O(n) worst case never materialises on the load path.

Measured outcome: deserialization is faster than BTreeMap; memory drops ~25% on large files.

Step 2 (not productionized): pointer tagging

On x86-64, a u64 pointer has 48 bits of addressable range and 16 free bits at the top. Figma's Map<u16, pointer> needs exactly 16 bits of field ID per entry. Pack both into one u64:

Vec<u64> {
    [0_u16  | type_ptr_u48],
    [1_u16  | FRAME_ptr_u48],
    ...  // x, y, w, h, etc.
}

Access via bit ops:

fn field_id(packed: u64) -> u16 { (packed >> 48) as u16 }
fn pointer(packed: u64) -> *mut T { (packed & 0x0000_FFFF_FFFF_FFFF) as *mut T }

Complication: Figma uses reference-counted pointers (Rust's Rc/Arc). The refcount lives beside the object, not inside the pointer — so insert and get must correctly increment and decrement refcounts through the masked pointer, which is now type-erased from Rust's perspective. Any bug here = memory corruption or double-free. The post doesn't specify whether they wrapped this in an unsafe module with ownership invariants or adopted a different pattern.

Benchmark result: marginally faster than flat Vec<(u16, u64)>, ~5% lower memory. Not 20%, because RSS vs allocated-memory divergence.

Decision: shelve. The risk-vs-win ratio didn't pencil out.

What didn't change

  • The wire / serialization protocol (still sorted; that's what lets the vector's O(n) insert never trigger on the load path).
  • The schema (~200 properties, human-speed evolution).
  • The client side (browser-side property representation is WebAssembly-visible but not addressed here).
  • QueryGraph (the 2024 dependency graph overlay) is orthogonal — operates on node-level edges; this post is about the per-node property-bag representation.

Numbers disclosed

  • >60% — share of per-file server memory consumed by the Map<u16, pointer> property index (pre-optimization).
  • ~60 keys — average property count per node, measured on sample files.
  • < 200 — schema-imposed upper bound on property count.
  • ~25% — memory drop on large files after the BTreeMapVec<(u16, u64)> change.
  • 20% — p99 file deserialization performance improvement (fleet-wide).
  • ~20% — memory-cost reduction across the entire Multiplayer fleet.
  • ~5% — additional RSS savings from pointer-tagging (vs the simple vector) in benchmark; not productionized.
  • 30% — increase in server-side file decode volume after dynamic page loading (the forcing function).
  • 48 bits — used portion of an x86-64 pointer; 16 bits free for tagging.

Numbers not disclosed

  • Absolute memory footprint per file (pre/post).
  • Load-latency distribution (p50, p95 — only p99 quoted).
  • Multiplayer fleet size / cost base.
  • CPU profile — is the load path memory-bandwidth-bound, or instruction-count-bound?
  • Which BTreeMap implementation (Rust stdlib's B-tree with B=6 vs a variant).
  • Allocator choice (jemalloc? mimalloc? system?) — directly relevant to the RSS-vs-allocated gap.
  • Refcount strategy under pointer tagging (how Rc::clone / Rc::drop work through a masked pointer).
  • Whether the pointer-tagging branch is still maintained or bit-rotted.

Caveats / scope

  • Author pool not disclosed; no individual-engineer attribution.
  • Post frames the work as server-side multiplayer substrate; does not claim the technique generalises beyond the property map (other Figma server-side data structures not profiled).
  • The 48-bit addressable assumption is architecture-specific to x86-64 and ARM64 (and not strictly guaranteed even there — Linux has a boot option CONFIG_X86_5LEVEL that enables 57-bit addressing, and ARM's TBI explicitly uses top-byte tagging). The post acknowledges this: "this is not a guarantee, and could change in the future."
  • No discussion of whether the Vec<(u16, u64)> change applied uniformly or only to a specific code path; production rollout mechanism unstated.

Relationship to existing wiki

Raw file

raw/figma/2026-04-21-supporting-faster-file-load-times-with-memory-optimizations-6006a36c.md

Source

Last updated · 200 distilled / 1,178 read