PATTERN Cited by 1 source
Custom data structure for hot path¶
Intent¶
When stdlib or well-known crate data structures don't match your workload shape on a genuine hot path, write one that does. Don't force-fit a popular structure into the wrong workload because it's popular.
Context¶
Off-the-shelf data-structure crates are written for the usual workload shape of that type:
HashMap— general-purpose, balanced reads/writes.BTreeMap— sorted iteration, logarithmic reads/writes.radix_trie— typed-autocomplete / keyboard workloads.regex— expressive pattern language, not raw-speed matching.
At CDN-scale hot paths, the workload can be very narrow: static key set, short keys, >90 % miss rate, nanosecond budget. None of the above are tuned for exactly that, and the gap shows up as real cores on the fleet bill.
Mechanism¶
Design a structure that exploits the specifics of the workload:
- Cache locality. If the whole data fits in L1/L2, fold the entire structure into one contiguous allocation rather than pointer-chasing nodes.
- Bit tricks. Pack node-relationship metadata into the
bits of a machine word (
u32/u64) so traversal is shift-and-mask, not pointer dereference. - Workload-specific asymptotics. On a miss-heavy workload, beat the hit-optimized general case — even if hits get worse in the process.
- Static key set at build time. When keys don't change, precompute optimal layout; no dynamic insert cost.
The output is usually a small crate (100s-1000s of lines), open-sourced as a specific-purpose tool, not a general library.
Canonical instances¶
Cloudflare trie-hard (2024)¶
systems/trie-hard replaced a HashMap<HeaderName> in
systems/pingora-origin's clear_internal_headers. Design
choices:
- Node edges as bits of unsigned integers (not child-pointer arrays).
- Whole trie in one contiguous memory chunk (not per-node allocations).
- Built for short static keys + >90 % miss rate — the specific shape of HTTP-header membership checks.
Result: 1.53 µs (HashMap) → 0.93 µs (trie-hard) → 1.28 %
total CPU saving = ~550 cores on a 40,000-core fleet.
radix_trie from crates.io benchmarked a full microsecond
slower than the HashMap it was supposed to replace — the
custom crate was the only path.
Figma flat-sorted-Vec (2026)¶
concepts/small-map-as-sorted-vec — Figma Multiplayer
replaced BTreeMap<u16, u64> with Vec<(u16, u64)> for
per-node property maps. Design choices:
- Flat contiguous storage vs pointer-chased tree nodes.
- Schema-bounded
n< 200 and input sorted on the wire — so the theoretically-worseO(n)insert never fires on the load path.
Result: ~25 % memory drop on large files, deserialization
faster than BTreeMap despite the Big-O regression. Server
fleet memory cost down ~20 %, p99 file-load latency up 20 %.
Anti-patterns¶
- Writing a custom data structure before measuring. The crate exists for a reason; if the stock one is within 10 %, ship the stock one. Custom structures are justified when the gap is at least 2× and the volume is high enough to matter.
- Not benchmarking against real workload. A "faster trie" on random English words may not be faster on 20-byte ASCII HTTP header names with a 90 % miss rate. Match the input distribution.
- Generalizing the crate too early. trie-hard is narrow on purpose — static key set, miss-heavy, short keys. Trying to make it also handle dynamic inserts / long keys / hit- dominated workloads destroys the reason it won.
- Skipping the OSS release. Open-sourcing the result is a peer-review step and a future hiring signal; the engineering cost was already paid.
Seen in¶
- sources/2024-09-10-cloudflare-a-good-day-to-trie-hard —
Cloudflare's trie-hard crate replacing HashMap in pingora-
origin's
clear_internal_headers; the archetypal "off-the- shelf crates were all slower, so we wrote our own" case. - sources/2026-04-21-figma-supporting-faster-file-load-times-memory-optimizations-rust
— Figma's Multiplayer flat-sorted-Vec replacing
BTreeMapfor per-node property maps; same cache-locality-over- asymptotics lesson at a different substrate.
Related¶
- patterns/measurement-driven-micro-optimization — the methodology that justifies custom crates.
- concepts/trie-data-structure — category of which trie-hard is a specific optimized instance.
- concepts/small-map-as-sorted-vec — sibling pattern for small-N associative containers.
- concepts/cache-locality — the main reason custom structures win at µs scales.
- systems/trie-hard — canonical crate.