PATTERN Cited by 1 source
Flip lookup direction¶
Intent¶
When a hot-path operation is expressed as "for each element in
set A, check/remove it from collection B", and |A| ≫ |B|
(or vice versa), invert the iteration so you loop over the
smaller set and probe the larger one. The number of probes
drops by the cardinality ratio, and you gain the freedom to
swap the probed side to a different (faster) data structure.
Context¶
Two collections meet at a membership test on the hot path:
- set
Aof forbidden / filtered / tagged items (often large and static — e.g. 100+ internal header names). - collection
Bof per-request items (often small — e.g. 10-30 headers on this request).
The naive code iterates A and probes B for each element.
If |A| is much larger than |B|, this does way more work
than necessary: you check 100 items when 10-30 would have
sufficed to find every match.
Mechanism¶
Invert the direction: iterate B, probe A. Two consequences:
- Probe count drops from
|A|to|B|. If|A|=100and average|B|=20, that's a 5× reduction in probes. - The probed side is now free to change structure.
Abecomes a set-membership-only collection:HashMap,BTreeSet, trie, finite-state machine, regex — anything with fast read. This is often where the real win lives, because it opens the door to swapAfor trie / FSM / custom crate on miss-heavy workloads (concepts/big-o-over-key-length).
Rust-specific caveat¶
Rust's http::HeaderMap does not yet have retain
(hyperium/http #541),
so the flipped direction must collect matches first, then
remove — two passes over B instead of one. Still a big
net win at CDN scale.
Canonical instance (Cloudflare, 2024-09-10)¶
Cloudflare's systems/pingora-origin had:
pub fn clear_internal_headers(request_header: &mut RequestHeader) {
INTERNAL_HEADERS.iter().for_each(|h| {
request_header.remove_header(h);
});
}
This iterates 100+ internal headers per request and probes the request each time — with 10-30 request headers per request. After flipping the lookup direction:
pub fn clear_internal_headers(request_header: &mut RequestHeader) {
let to_remove = request_header
.headers
.keys()
.filter_map(|name| INTERNAL_HEADER_SET.get(name))
.collect::<Vec<_>>();
to_remove.into_iter().for_each(|k| {
request_header.remove_header(k);
});
}
Runtime 3.65 µs → 1.53 µs (2.39×) on the criterion bench
alone. CPU share 1.71 % → 0.82 % in production stack-trace
sampling. And the INTERNAL_HEADER_SET could now be swapped
out to trial HashMap → BTreeSet/FST → regex → radix_trie
→ custom trie-hard crate — which got them
to 0.34 % CPU.
Anti-patterns¶
- Flipping without benchmarking. Works when
|A|≫|B|; can be slower when cardinalities are balanced (probe cost > iteration cost). - Keeping the probed side as a
Vec<String>. Once you've flipped, invest in a proper set structure (HashMap,BTreeSet, trie). Flipping while probing aVecby linear scan leaves the biggest win on the table. - Over-engineering at low request rates. At 1k QPS the original loop is fine; this pattern pays off in the hottest of hot paths (concepts/hot-path).
Seen in¶
- sources/2024-09-10-cloudflare-a-good-day-to-trie-hard —
Cloudflare flipping the direction of
clear_internal_headersin pingora-origin saved ~1 % CPU of 40,000 cores (~400 cores) before any trie work began. The structural prerequisite for the trie-hard swap that followed.
Related¶
- patterns/measurement-driven-micro-optimization — the discipline the flip sits inside.
- patterns/custom-data-structure-for-hot-path — what often follows the flip.
- concepts/trie-data-structure — the usual replacement structure for miss-heavy short-string sets.
- concepts/big-o-over-key-length — why the replacement matters even when asymptotics look OK.