Skip to content

PATTERN Cited by 1 source

Flip lookup direction

Intent

When a hot-path operation is expressed as "for each element in set A, check/remove it from collection B", and |A||B| (or vice versa), invert the iteration so you loop over the smaller set and probe the larger one. The number of probes drops by the cardinality ratio, and you gain the freedom to swap the probed side to a different (faster) data structure.

Context

Two collections meet at a membership test on the hot path:

  • set A of forbidden / filtered / tagged items (often large and static — e.g. 100+ internal header names).
  • collection B of per-request items (often small — e.g. 10-30 headers on this request).

The naive code iterates A and probes B for each element. If |A| is much larger than |B|, this does way more work than necessary: you check 100 items when 10-30 would have sufficed to find every match.

Mechanism

Invert the direction: iterate B, probe A. Two consequences:

  1. Probe count drops from |A| to |B|. If |A|=100 and average |B|=20, that's a 5× reduction in probes.
  2. The probed side is now free to change structure. A becomes a set-membership-only collection: HashMap, BTreeSet, trie, finite-state machine, regex — anything with fast read. This is often where the real win lives, because it opens the door to swap A for trie / FSM / custom crate on miss-heavy workloads (concepts/big-o-over-key-length).

Rust-specific caveat

Rust's http::HeaderMap does not yet have retain (hyperium/http #541), so the flipped direction must collect matches first, then remove — two passes over B instead of one. Still a big net win at CDN scale.

Canonical instance (Cloudflare, 2024-09-10)

Cloudflare's systems/pingora-origin had:

pub fn clear_internal_headers(request_header: &mut RequestHeader) {
    INTERNAL_HEADERS.iter().for_each(|h| {
        request_header.remove_header(h);
    });
}

This iterates 100+ internal headers per request and probes the request each time — with 10-30 request headers per request. After flipping the lookup direction:

pub fn clear_internal_headers(request_header: &mut RequestHeader) {
    let to_remove = request_header
        .headers
        .keys()
        .filter_map(|name| INTERNAL_HEADER_SET.get(name))
        .collect::<Vec<_>>();
    to_remove.into_iter().for_each(|k| {
        request_header.remove_header(k);
    });
}

Runtime 3.65 µs → 1.53 µs (2.39×) on the criterion bench alone. CPU share 1.71 % → 0.82 % in production stack-trace sampling. And the INTERNAL_HEADER_SET could now be swapped out to trial HashMapBTreeSet/FST → regex → radix_trie → custom trie-hard crate — which got them to 0.34 % CPU.

Anti-patterns

  • Flipping without benchmarking. Works when |A||B|; can be slower when cardinalities are balanced (probe cost > iteration cost).
  • Keeping the probed side as a Vec<String>. Once you've flipped, invest in a proper set structure (HashMap, BTreeSet, trie). Flipping while probing a Vec by linear scan leaves the biggest win on the table.
  • Over-engineering at low request rates. At 1k QPS the original loop is fine; this pattern pays off in the hottest of hot paths (concepts/hot-path).

Seen in

  • sources/2024-09-10-cloudflare-a-good-day-to-trie-hard — Cloudflare flipping the direction of clear_internal_headers in pingora-origin saved ~1 % CPU of 40,000 cores (~400 cores) before any trie work began. The structural prerequisite for the trie-hard swap that followed.
Last updated · 200 distilled / 1,178 read