SYSTEM Cited by 1 source
trie-hard¶
trie-hard is Cloudflare's open-source Rust crate (github.com/cloudflare/trie-hard) implementing a trie optimized for a specific workload shape: read-heavy membership checks against a static, schema-bounded set of short string keys, with a high miss rate (>90 % of reads return "not in set").
Target workload¶
The original motivating workload is
systems/pingora-origin's clear_internal_headers helper,
which checks each header on an outbound request against a set of
~100 Cloudflare internal headers to strip. Average request has
10-30 headers; ≥90 % of them are not internal — so the hot path
is miss-heavy, exactly the shape where tries beat hashmaps
(concepts/big-o-over-key-length: tries give O(log L) on
misses vs hashmap's O(L)).
Design wins¶
- Edge relationships packed into the bits of unsigned integers. Instead of storing child-pointer arrays or hashmap-per-node, the trie encodes "which characters can follow" as bitmask integers — bit-count / shift operations replace pointer dereferences.
- Entire tree in one contiguous memory chunk. No allocation-per-node; traversal is a cache-friendly walk across a small flat buffer. CPU caches stay hot, TLB doesn't thrash.
- Short keys win big because the whole hot set fits in L1 / L2 with room to spare.
Measured performance¶
From the 2024-09-10 criterion microbench on Cloudflare's
synthetic request-set (uniform internal / non-internal header
mix):
| Structure | avg runtime | vs original |
|---|---|---|
| Original loop (internal × request) | 3.65 µs | 1.00× |
Flip direction + HashMap |
1.53 µs | 2.39× |
BTreeSet via FST |
~1.58 µs | ~2.31× |
| Regex state machine | ~3 µs | ~1.2× |
radix_trie (crates.io) |
~2.5 µs | ~1.46× |
| trie-hard | 0.93 µs | 3.93× |
Production stack-trace sampling on pingora-origin after
deployment (July 2024): CPU share of clear_internal_headers
4 / 1,171 samples → 0.34 % (vs 1.71 % before).
Limits¶
- Hits are still O(L). The miss optimization is the big win;
for a hit-dominated workload, a hashmap / dense array might be
faster. Not a general-purpose replacement for
HashMap. - Static key set. Optimized for build-once-read-many. Dynamic inserts / deletes at high rate are not the target.
- Short strings. Designed for HTTP header names (short, ASCII). Long-key workloads don't get the same cache-locality win.
radix_trieis the fastest crates.io alternative Cloudflare benchmarked, and trie-hard beats it by ~1 µs on this workload. For other shapes, off-the-shelf crates may still win; the crate README is explicit about the target.
Seen in¶
- sources/2024-09-10-cloudflare-a-good-day-to-trie-hard — trie-hard's announcement + benchmark methodology + production CPU result; deployed in pingora-origin since July 2024.
Related¶
- systems/pingora-origin — original production consumer.
- concepts/trie-data-structure — generic concept this crate instantiates.
- concepts/big-o-over-key-length — why tries beat hashmaps on miss-heavy short-key workloads.
- patterns/custom-data-structure-for-hot-path — the engineering pattern trie-hard exemplifies (write a new structure when crates.io doesn't match your shape).
- concepts/small-map-as-sorted-vec — sibling
cache-locality-over-asymptotic-complexity story in the wiki
(Figma's Multiplayer flat-sorted-Vec replacing
BTreeMap).