Skip to content

CONCEPT Cited by 1 source

Trie data structure

A trie (pronounced "try" or "tree") is a tree-based data structure where each node represents a substring prefix and each edge is labeled by a single character. Walking from root to leaf spells out a string; the set of root-to-leaf paths is the set of strings stored.

Asymptotic behaviour

For a key set containing strings of average length L:

  • Read-miss complexity: O(log L). At every depth you can prune all keys not matching the prefix so far — usually after a few characters you know the key isn't in the set.
  • Read-hit complexity: O(L). You have to walk the full path.

This is fundamentally different from the usual intuition for HashMap:

  • HashMap read is O(1) over table size, but O(L) over key length — every byte of the key goes into the hash function (see concepts/big-o-over-key-length).
  • On miss-heavy workloads the trie's O(log L) miss is strictly better than HashMap's O(L) miss.

When tries win in practice

  • >90 % miss rate — the miss path is the bottleneck.
  • Short keys — cache-locality of a contiguous trie wins over pointer-chased per-node allocation.
  • Static / rarely-changing key set — build-once-read-many.

When tries lose

  • Hit-dominated workloads — trie hit is O(L) too; a dense array or hashmap on a perfect hash can be faster.
  • Dynamic key set with high insert rate — most trie implementations favour read perf.
  • Long keys — cache-locality win goes away.
  • Typical crates.io trie is optimized for autocomplete / keyboard use cases, not tens of millions of reads per second on a server hot path — see systems/trie-hard for the "had to write our own" story.

Relation to state machines / regex

A state machine or regex evaluator has the same early-reject property (stop at the first non-matching character), and benchmarks similarly for miss-heavy membership checks. Tries are just the data-structure realization; state machines are the programmatic realization. Picking between them is usually about API ergonomics, not speed.

Canonical instance in this wiki

systems/trie-hard is a Cloudflare-open-sourced Rust crate that packs node relationships into the bits of unsigned integers and keeps the whole tree in one contiguous memory chunk to hit the cache hard. It replaced a HashMap in systems/pingora-origin's clear_internal_headers and cut that function's CPU share from 0.82 % → 0.34 %, for a full 1.28 %-CPU / ~550-core saving end-to-end on a 40,000-core Cloudflare pingora-origin fleet.

Last updated · 200 distilled / 1,178 read