CONCEPT Cited by 1 source
Big-O over key length¶
Standard algorithmic analysis talks about Big-O over the
number of stored elements n — a hashmap read is O(1)
whereas a sorted-tree lookup is O(log n). At CDN / hot-path
scales, the hidden cost is over the length of the key L:
HashMap<String, V>read isO(1)overn— great.- But every byte of the key goes into the hash function —
so the read is also
O(L)over the key length. - Same for equality tests on keys that collide in a bucket, and for any string-compare step at the bottom.
Why this matters¶
At 35 M requests/sec on 40,000 cores (the scale of
systems/pingora-origin), 20-byte HTTP header names hashed 10
to 30 times per request is not "free." The marginal time per
hash is tens of nanoseconds, and all of that is O(L) work.
Asymptotic intuition — "hashmap is O(1)" — hides the real
bottleneck.
Beating O(L)¶
- Sorted structures like
BTreeSet:O(log L · log n)— often slower than hashmap on tinyn, because each key- compare is stillO(L)plus log factors. - Tries / state machines / regex give
O(log L)on misses because they reject at the first wrong character. Hits remainO(L). - Miss-dominated workloads (>90 % miss rate) are where tries-and-friends win over hashmaps.
- Finite state machines (FST), perfect hash functions
on the static key set, and bit-packed tries (e.g.
systems/trie-hard) are the common sub-
O(L)shapes.
Rule of thumb¶
If your key length is 20+ bytes and the operation is miss- heavy and on the hot path,
HashMapis not the endgame. Measure with a real benchmark and consider a trie / FSM / custom crate.
Canonical instance¶
Cloudflare's pingora-origin clear_internal_headers hot path:
HashMap at 1.53 µs → trie-hard at
0.93 µs — a 40 % reduction at the helper level that flowed
through to a 1.28 % total-CPU reduction on the fleet, almost
entirely attributable to beating O(L) on misses
(sources/2024-09-10-cloudflare-a-good-day-to-trie-hard).
Related¶
- concepts/trie-data-structure — the usual sub-
O(L)container for short strings. - systems/trie-hard — Cloudflare's bit-packed trie crate.
- patterns/measurement-driven-micro-optimization — because you can't tell without measuring.