Supporting Faster File Load Times with Memory Optimizations in Rust¶
Summary¶
Figma's Multiplayer server loads
each collaborative file into memory to propagate edits across a complex
node graph. After the 2024 dynamic page loading rollout
(sources/2024-05-22-figma-dynamic-page-loading), the number of files
the Multiplayer service needs to decode server-side grew ~30%. To
absorb the new load, the Rust team attacked memory usage on the hottest
data structure: the node property map (Map<u16, pointer_u64> —
keys are property IDs, values are pointers to property-value blobs),
which memory profiling showed
accounted for >60% of a file's memory usage — a surprising finding
given the map stores only metadata, not payload. Replacing the default
Rust BTreeMap with a flat sorted Vec<(u16, pointer)> cut
deserialization time (the load critical path) and dropped memory
~25% on large files. A second experiment tagged the unused top 16
bits of x86 pointers with the field ID (canonical concepts/tagged-pointer)
for a 20% theoretical-density win; shipped to a benchmark but not
productionized because the marginal ~5% RSS improvement wasn't worth
the pointer-lifetime-correctness cost. Net fleet outcome:
20% improvement in p99 file deserialization latency and ~20%
memory-cost reduction across the multiplayer fleet.
Key takeaways¶
-
Profile memory before optimizing — metadata can dominate payload. The node property map (mapping
u16property IDs tou64pointers) was >60% of per-file memory usage despite storing only metadata. "Given that the map just stores metadata and not data, we were pretty surprised by this finding." Without this data point, the optimization target would have been the property values themselves, not the property index — a canonical instance of measurement-driven micro-optimization at the memory-profile substrate. -
Flat sorted vectors beat balanced-tree maps at small N on modern CPUs. Rust's default
BTreeMaphasO(log n)lookup/insert; a sortedVec<(key, value)>hasO(log n)binary-search lookup +O(n)insert — theoretically worse. But for the property map's measured average of ~60 keys drawn from a < 200-key schema domain, cache-friendly linear memory access on contiguous entries dominates asymptotic order (concepts/cache-locality). Deserialization (the hot path) got faster, not slower — despite the asymptotic regression — because the vector's entire working set fits in a few cache lines. "Computers are especially fast when they have to traverse and perform computation on small amounts of linear memory (the exact setup here)." -
Keyspace constraints come from product shape, not theoretical bounds. New property IDs are added only when the file schema evolves — "at human speeds, i.e. extremely slowly." Schema has fewer than 200 fields; most properties cluster by node type (comment fields only appear on comment nodes, etc.). Measured average per-node map size: ~60 keys. This observed distribution — not the map's theoretical capacity — is what justifies the flat-vector choice. The bound that matters is the empirical one.
-
x86 pointers carry 16 free bits because of the 48-bit address space — exploit them for data packing. Pointers are 64 bits but x86 processors address only 2^48 bytes; the top 16 bits are architecturally unused and "free for us to use." (concepts/tagged-pointer — pointer tagging.) Figma's
Map<u16, pointer>happens to need exactly 16 bits for the field ID, so both can be packed into a singleu64with bit operations:Vec<u64>where each entry is[field_id_u16 | pointer_u48]. Fetch with mask-and-shift. Conceptually equivalent to V8 SMI tagging, OCaml boxed/unboxed representation, Objective-C tagged pointers, JavaScriptCore NaN-boxing. -
The memory metric that matters is RSS, not allocated bytes. "The memory metric we really care about is resident set size (RSS), which doesn't always correspond directly with allocated memory and depends on the behavior of the underlying allocator and the OS." Pointer-tagging naively should save 20% allocated memory (collapse two
u64s into one) but delivered only ~5% RSS — the allocator's free-list shape and page-commit patterns broke the arithmetic. Same lesson as Datadog'sruntime/metricsvs/proc/smapsdivergence: virtual and resident diverge silently, and the OS cares about resident. -
Not productionizing is a valid outcome of a profitable experiment. The pointer-tagging rewrite produced "marginally faster performance on the benchmark, and slightly lower memory use (about 5% less than the simple vector approach)." But Rust's ownership model forces careful management of reference-counted-pointer lifecycle —
insert/getmust increment / decrement refcounts correctly, and pointer-tagging obscures what is and isn't a valid pointer from the type system. Decision: "the extra win didn't seem worth the potential memory pitfalls we were opening ourselves up to, but it's always a lever we have in the future." The work remains on the shelf as known-feasible optionality. -
Reported fleet-wide outcomes (after deploying the flat-vector change, not the pointer-tagging change): +20% file deserialization performance at p99 ("the slowest Figma files now load faster") + ~20% memory-cost reduction across the entire multiplayer fleet. Server-side file load is the critical path for opening any file; this propagates directly to user-visible file-open latency. Pairs naturally with the 2024 dynamic page loading post (which cut client memory ~70% and p99 slow-file load ~33%) — the 2024 post fixed the client, this one fixes the server that feeds the client.
Architectural details¶
Property map — the hot data structure¶
A Figma file is a collection of nodes (triangle / square / frame / …) on Multiplayer's object-tree document model. Each node is a bag of properties — color, shape, parent, x/y/w/h — addressed by property ID. Server-side representation:
This map is in the deserialization critical path, so any speedup propagates directly to file-load time. Memory profiling flagged it at >60% of total file memory — higher than any single property-value class.
Step 1: BTreeMap → flat sorted Vec<(u16, u64)>¶
Baseline (Rust BTreeMap<u16, pointer>):
BTreeMap<u16, pointer> {
0: 1, // node_id
1: FRAME, // type
2: null, // parent_id
... // x, y, w, h, etc.
}
Optimized (sorted Vec<(u16, pointer)>):
Big-O comparison (post admits the vector is "strictly worse" theoretically):
| Operation | BTreeMap |
Vec<(k, v)> |
|---|---|---|
| Lookup | O(log n) | O(log n) (binary search) |
| Insert | O(log n) | O(n) (worst case — shift) |
| Iterate | O(n) | O(n) (but faster constant) |
| Memory/entry | tree-node overhead | 10 bytes (u16 + u64) |
Why the "worse" data structure wins at the observed working-set size (average ~60 entries, max ≪ 200):
- No per-node allocation / pointer chasing — the whole map is one contiguous allocation.
- Cache-line-scale traversal. A 64-entry vector of
(u16, u64)is ~640 bytes — fits in ~10 cache lines, often prefetched as a block. - Deserialization already sees entries in sorted order on the wire (the serialization protocol guarantees sorted iteration), so insertion is appending, not shifting — the O(n) worst case never materialises on the load path.
Measured outcome: deserialization is faster than BTreeMap;
memory drops ~25% on large files.
Step 2 (not productionized): pointer tagging¶
On x86-64, a u64 pointer has 48 bits of addressable range and
16 free bits at the top. Figma's Map<u16, pointer> needs exactly
16 bits of field ID per entry. Pack both into one u64:
Access via bit ops:
fn field_id(packed: u64) -> u16 { (packed >> 48) as u16 }
fn pointer(packed: u64) -> *mut T { (packed & 0x0000_FFFF_FFFF_FFFF) as *mut T }
Complication: Figma uses reference-counted pointers (Rust's
Rc/Arc). The refcount lives beside the object, not inside the
pointer — so insert and get must correctly increment and
decrement refcounts through the masked pointer, which is now
type-erased from Rust's perspective. Any bug here = memory corruption
or double-free. The post doesn't specify whether they wrapped this
in an unsafe module with ownership invariants or adopted a different
pattern.
Benchmark result: marginally faster than flat Vec<(u16, u64)>,
~5% lower memory. Not 20%, because RSS vs allocated-memory
divergence.
Decision: shelve. The risk-vs-win ratio didn't pencil out.
What didn't change¶
- The wire / serialization protocol (still sorted; that's what lets the vector's O(n) insert never trigger on the load path).
- The schema (~200 properties, human-speed evolution).
- The client side (browser-side property representation is WebAssembly-visible but not addressed here).
- QueryGraph (the 2024 dependency graph overlay) is orthogonal — operates on node-level edges; this post is about the per-node property-bag representation.
Numbers disclosed¶
- >60% — share of per-file server memory consumed by the
Map<u16, pointer>property index (pre-optimization). - ~60 keys — average property count per node, measured on sample files.
- < 200 — schema-imposed upper bound on property count.
- ~25% — memory drop on large files after the
BTreeMap→Vec<(u16, u64)>change. - 20% — p99 file deserialization performance improvement (fleet-wide).
- ~20% — memory-cost reduction across the entire Multiplayer fleet.
- ~5% — additional RSS savings from pointer-tagging (vs the simple vector) in benchmark; not productionized.
- 30% — increase in server-side file decode volume after dynamic page loading (the forcing function).
- 48 bits — used portion of an x86-64 pointer; 16 bits free for tagging.
Numbers not disclosed¶
- Absolute memory footprint per file (pre/post).
- Load-latency distribution (p50, p95 — only p99 quoted).
- Multiplayer fleet size / cost base.
- CPU profile — is the load path memory-bandwidth-bound, or instruction-count-bound?
- Which
BTreeMapimplementation (Rust stdlib's B-tree with B=6 vs a variant). - Allocator choice (jemalloc? mimalloc? system?) — directly relevant to the RSS-vs-allocated gap.
- Refcount strategy under pointer tagging (how
Rc::clone/Rc::dropwork through a masked pointer). - Whether the pointer-tagging branch is still maintained or bit-rotted.
Caveats / scope¶
- Author pool not disclosed; no individual-engineer attribution.
- Post frames the work as server-side multiplayer substrate; does not claim the technique generalises beyond the property map (other Figma server-side data structures not profiled).
- The 48-bit addressable assumption is architecture-specific
to x86-64 and ARM64 (and not strictly guaranteed even there —
Linux has a boot option
CONFIG_X86_5LEVELthat enables 57-bit addressing, and ARM's TBI explicitly uses top-byte tagging). The post acknowledges this: "this is not a guarantee, and could change in the future." - No discussion of whether the
Vec<(u16, u64)>change applied uniformly or only to a specific code path; production rollout mechanism unstated.
Relationship to existing wiki¶
- Sibling to sources/2024-05-22-figma-dynamic-page-loading (client-memory + critical-path decode optimizations) — this one addresses the server side of the same architectural pressure. The 30%-decode-volume increase that motivated this post is the direct downstream of dynamic page loading's success.
- Builds on sources/2019-figma-how-figmas-multiplayer-technology-works (the substrate — Multiplayer's object-tree document model and per-document server process).
- Parallel to sources/2025-07-17-datadog-go-124-memory-regression
on the RSS-vs-allocated divergence lesson (Datadog Go 1.24
found
runtime/metricsdiverged from/proc/smaps; Figma's pointer-tagging found allocated-20%-delta → RSS-5%-delta). - Companion to sources/2024-04-27-figma-speeding-up-c-build-times on the Figma engineering culture of profile-driven data-structure replacement at the hot path (C++ build: post-pre-processing bytes drove forward-declaration retrofits; Rust multiplayer: property-map bytes drove the BTreeMap→Vec swap).
Raw file¶
raw/figma/2026-04-21-supporting-faster-file-load-times-with-memory-optimizations-6006a36c.md
Source¶
- Original: https://www.figma.com/blog/supporting-faster-file-load-times-with-memory-optimizations-in-rust/
- Raw markdown:
raw/figma/2026-04-21-supporting-faster-file-load-times-with-memory-optimizations-6006a36c.md
Related¶
- companies/figma — ingest landed here under the Key patterns / concepts list (small-map-as-sorted-vec + tagged-pointer bullets) and under Recent articles.
- systems/figma-multiplayer-querygraph — the subject system; this source is about its server-side property-map representation.
- concepts/small-map-as-sorted-vec — the pattern instantiated (the
BTreeMap→ flatVecswap for schema-bounded small-N maps); this source is the canonical wiki anchor for the concept. - concepts/tagged-pointer — the shelved-but-feasible second experiment; this source is also the canonical wiki anchor for the concept plus the RSS-vs-allocated-bytes divergence lesson.
- concepts/cache-locality — what makes the theoretically-worse container practically-faster at small N.
- patterns/measurement-driven-micro-optimization — the methodology that identified the >60% memory share as the optimization target.
- patterns/custom-data-structure-for-hot-path — sibling pattern
(stdlib
BTreeMapdidn't match the workload shape → flat sorted Vec was the justified-by-measurement replacement). - Sibling source: sources/2024-05-22-figma-dynamic-page-loading — the 2024 post that drove the ~30% server-side file-decode volume increase this post absorbs.
- Foundational substrate: sources/2019-figma-how-figmas-multiplayer-technology-works — Multiplayer's one-process-per-document + object-tree data model.
- Parallel lesson on RSS-vs-allocated divergence: sources/2025-07-17-datadog-go-124-memory-regression.
- Sibling lesson at a different substrate:
sources/2024-09-10-cloudflare-a-good-day-to-trie-hard — custom
crate replacing
HashMapon Pingora's hot path; same cache-locality- over-asymptotics framing.