CONCEPT Cited by 1 source
Trie data structure¶
A trie (pronounced "try" or "tree") is a tree-based data structure where each node represents a substring prefix and each edge is labeled by a single character. Walking from root to leaf spells out a string; the set of root-to-leaf paths is the set of strings stored.
Asymptotic behaviour¶
For a key set containing strings of average length L:
- Read-miss complexity:
O(log L). At every depth you can prune all keys not matching the prefix so far — usually after a few characters you know the key isn't in the set. - Read-hit complexity:
O(L). You have to walk the full path.
This is fundamentally different from the usual intuition for
HashMap:
HashMapread isO(1)over table size, butO(L)over key length — every byte of the key goes into the hash function (see concepts/big-o-over-key-length).- On miss-heavy workloads the trie's
O(log L)miss is strictly better thanHashMap'sO(L)miss.
When tries win in practice¶
- >90 % miss rate — the miss path is the bottleneck.
- Short keys — cache-locality of a contiguous trie wins over pointer-chased per-node allocation.
- Static / rarely-changing key set — build-once-read-many.
When tries lose¶
- Hit-dominated workloads — trie hit is
O(L)too; a dense array or hashmap on a perfect hash can be faster. - Dynamic key set with high insert rate — most trie implementations favour read perf.
- Long keys — cache-locality win goes away.
- Typical
crates.iotrie is optimized for autocomplete / keyboard use cases, not tens of millions of reads per second on a server hot path — see systems/trie-hard for the "had to write our own" story.
Relation to state machines / regex¶
A state machine or regex evaluator has the same early-reject property (stop at the first non-matching character), and benchmarks similarly for miss-heavy membership checks. Tries are just the data-structure realization; state machines are the programmatic realization. Picking between them is usually about API ergonomics, not speed.
Canonical instance in this wiki¶
systems/trie-hard is a Cloudflare-open-sourced Rust crate
that packs node relationships into the bits of unsigned integers
and keeps the whole tree in one contiguous memory chunk to hit
the cache hard. It replaced a HashMap in
systems/pingora-origin's clear_internal_headers and cut
that function's CPU share from 0.82 % → 0.34 %, for a full
1.28 %-CPU / ~550-core saving end-to-end on a 40,000-core
Cloudflare pingora-origin fleet.
Related¶
- concepts/big-o-over-key-length — why hashmap isn't the O(1) silver bullet.
- systems/trie-hard — canonical production instance.
- concepts/small-map-as-sorted-vec — sibling story about cache-locality beating asymptotic complexity at small N.