Skip to content

PATTERN Cited by 1 source

Custom data structure for hot path

Intent

When stdlib or well-known crate data structures don't match your workload shape on a genuine hot path, write one that does. Don't force-fit a popular structure into the wrong workload because it's popular.

Context

Off-the-shelf data-structure crates are written for the usual workload shape of that type:

  • HashMap — general-purpose, balanced reads/writes.
  • BTreeMap — sorted iteration, logarithmic reads/writes.
  • radix_trie — typed-autocomplete / keyboard workloads.
  • regex — expressive pattern language, not raw-speed matching.

At CDN-scale hot paths, the workload can be very narrow: static key set, short keys, >90 % miss rate, nanosecond budget. None of the above are tuned for exactly that, and the gap shows up as real cores on the fleet bill.

Mechanism

Design a structure that exploits the specifics of the workload:

  • Cache locality. If the whole data fits in L1/L2, fold the entire structure into one contiguous allocation rather than pointer-chasing nodes.
  • Bit tricks. Pack node-relationship metadata into the bits of a machine word (u32/u64) so traversal is shift-and-mask, not pointer dereference.
  • Workload-specific asymptotics. On a miss-heavy workload, beat the hit-optimized general case — even if hits get worse in the process.
  • Static key set at build time. When keys don't change, precompute optimal layout; no dynamic insert cost.

The output is usually a small crate (100s-1000s of lines), open-sourced as a specific-purpose tool, not a general library.

Canonical instances

Cloudflare trie-hard (2024)

systems/trie-hard replaced a HashMap<HeaderName> in systems/pingora-origin's clear_internal_headers. Design choices:

  • Node edges as bits of unsigned integers (not child-pointer arrays).
  • Whole trie in one contiguous memory chunk (not per-node allocations).
  • Built for short static keys + >90 % miss rate — the specific shape of HTTP-header membership checks.

Result: 1.53 µs (HashMap) → 0.93 µs (trie-hard) → 1.28 % total CPU saving = ~550 cores on a 40,000-core fleet. radix_trie from crates.io benchmarked a full microsecond slower than the HashMap it was supposed to replace — the custom crate was the only path.

Figma flat-sorted-Vec (2026)

concepts/small-map-as-sorted-vec — Figma Multiplayer replaced BTreeMap<u16, u64> with Vec<(u16, u64)> for per-node property maps. Design choices:

  • Flat contiguous storage vs pointer-chased tree nodes.
  • Schema-bounded n < 200 and input sorted on the wire — so the theoretically-worse O(n) insert never fires on the load path.

Result: ~25 % memory drop on large files, deserialization faster than BTreeMap despite the Big-O regression. Server fleet memory cost down ~20 %, p99 file-load latency up 20 %.

Anti-patterns

  • Writing a custom data structure before measuring. The crate exists for a reason; if the stock one is within 10 %, ship the stock one. Custom structures are justified when the gap is at least 2× and the volume is high enough to matter.
  • Not benchmarking against real workload. A "faster trie" on random English words may not be faster on 20-byte ASCII HTTP header names with a 90 % miss rate. Match the input distribution.
  • Generalizing the crate too early. trie-hard is narrow on purpose — static key set, miss-heavy, short keys. Trying to make it also handle dynamic inserts / long keys / hit- dominated workloads destroys the reason it won.
  • Skipping the OSS release. Open-sourcing the result is a peer-review step and a future hiring signal; the engineering cost was already paid.

Seen in

Last updated · 200 distilled / 1,178 read