Skip to content

SYSTEM Cited by 1 source

trie-hard

trie-hard is Cloudflare's open-source Rust crate (github.com/cloudflare/trie-hard) implementing a trie optimized for a specific workload shape: read-heavy membership checks against a static, schema-bounded set of short string keys, with a high miss rate (>90 % of reads return "not in set").

Target workload

The original motivating workload is systems/pingora-origin's clear_internal_headers helper, which checks each header on an outbound request against a set of ~100 Cloudflare internal headers to strip. Average request has 10-30 headers; ≥90 % of them are not internal — so the hot path is miss-heavy, exactly the shape where tries beat hashmaps (concepts/big-o-over-key-length: tries give O(log L) on misses vs hashmap's O(L)).

Design wins

  • Edge relationships packed into the bits of unsigned integers. Instead of storing child-pointer arrays or hashmap-per-node, the trie encodes "which characters can follow" as bitmask integers — bit-count / shift operations replace pointer dereferences.
  • Entire tree in one contiguous memory chunk. No allocation-per-node; traversal is a cache-friendly walk across a small flat buffer. CPU caches stay hot, TLB doesn't thrash.
  • Short keys win big because the whole hot set fits in L1 / L2 with room to spare.

Measured performance

From the 2024-09-10 criterion microbench on Cloudflare's synthetic request-set (uniform internal / non-internal header mix):

Structure avg runtime vs original
Original loop (internal × request) 3.65 µs 1.00×
Flip direction + HashMap 1.53 µs 2.39×
BTreeSet via FST ~1.58 µs ~2.31×
Regex state machine ~3 µs ~1.2×
radix_trie (crates.io) ~2.5 µs ~1.46×
trie-hard 0.93 µs 3.93×

Production stack-trace sampling on pingora-origin after deployment (July 2024): CPU share of clear_internal_headers 4 / 1,171 samples → 0.34 % (vs 1.71 % before).

Limits

  • Hits are still O(L). The miss optimization is the big win; for a hit-dominated workload, a hashmap / dense array might be faster. Not a general-purpose replacement for HashMap.
  • Static key set. Optimized for build-once-read-many. Dynamic inserts / deletes at high rate are not the target.
  • Short strings. Designed for HTTP header names (short, ASCII). Long-key workloads don't get the same cache-locality win.
  • radix_trie is the fastest crates.io alternative Cloudflare benchmarked, and trie-hard beats it by ~1 µs on this workload. For other shapes, off-the-shelf crates may still win; the crate README is explicit about the target.

Seen in

Last updated · 200 distilled / 1,178 read