CONCEPT Cited by 5 sources

Cache locality¶

Cache locality is the property that requests for the same key consistently arrive at the same node, so a node-local cache keyed by that key accumulates hits across those requests rather than being rebuilt independently on every node.

Why it matters¶

A local in-memory cache (Guava, Caffeine, an LRU map) is the fastest available caching tier — no network hop, no serialisation. But it only pays off when the same key stream reaches the same node long enough to fill and reuse the cache. If the same key is sprayed across N nodes, each node builds its own cache from cold, the hit rate per node is roughly 1/Nth of what a colocated deployment would see, and the backing system (database, external API, compute pipeline) is called N times for what should have been one memoisable operation.

Cache locality as a function of placement¶

The substrate that routes requests is what makes or breaks cache locality:

Consistent hashing on the client side (e.g. Memcached client routing) — colocates by key, stable under node churn via virtual nodes.
Partition colocation in a stream processor — same-key events across topics land on the same task/instance, so the instance's local cache sees the full keyspace.
Sharded application servers with request routing — user sticky routing + co-located user data on the same shard.
Random or round-robin load balancing — destroys cache locality; every node caches every key.

The Expedia failure mode¶

Expedia's Kafka Streams app expected cache locality across two input topics (same key → same instance, regardless of which topic carried it), built a local Guava cache to absorb an expensive per-key external-API call, and found in production that identical keys were arriving at different instances — because the two topics were in separate Kafka Streams sub-topologies and so partition colocation was not guaranteed. Each instance cached a disjoint slice of the keyspace; the API was called redundantly. Fix: a shared state store (patterns/shared-state-store-as-topology-unifier) forced sub-topology unification, which restored partition colocation, which restored cache locality (Source: sources/2025-11-11-expedia-kafka-streams-sub-topology-partition-colocation).

Design checklist¶

Is the keyspace actually small and hot enough that a local cache should help? (If every key is cold, cache locality is irrelevant.)
Does the routing substrate actually colocate identical keys onto the same node? Don't assume — test in production shape (partition count, replica count, topic count, rebalance behavior).
What happens on rebalance? Cache locality temporarily drops; plan for a warm-up window. Kafka Streams state-store restoration and sharded-service re-routing both have this property.
Is the cache layer actually a cache, or is it also load-bearing correctness? (Expedia's was deduplication — functionally a cache but a correctness-relevant one; losing cache locality caused redundant but not wrong work.)

Seen in¶

sources/2025-11-11-expedia-kafka-streams-sub-topology-partition-colocation — Expedia's production case: cross-topic cache locality was the target, sub-topology split was the failure mode, shared state-store the fix.
sources/2026-04-21-figma-the-search-for-speed-in-figma — OS-page-cache flavor. Figma cut OpenSearch index bytes 50% then a further 90% (unused fields removed), with no measurable relevancy impact — the entire performance win was making the working set fit inside the OS disk cache, turning variable cache-miss latency into predictable cache-hit latency. They then swapped to RAM-heavy / low-CPU nodes (1/3 CPU + 25% more RAM at ≈1/2 price) to protect that residency on a cheaper node mix. Cache locality at the OS-page-cache tier is the prerequisite that made fewer, larger shards viable.
sources/2024-09-10-cloudflare-a-good-day-to-trie-hard — CPU-cache flavor. Cloudflare's systems/trie-hard Rust crate beats HashMap / radix_trie / BTreeSet-FST / regex on pingora-origin header-stripping by storing the whole trie in a single contiguous allocation with bit-packed node relationships. 1.53 µs → 0.93 µs, 1.28 % of 40,000-core CPU saved (~550 cores). Cache locality at the L1/L2 tier turned theoretically-worse structures into the fastest option on the hot path.
sources/2026-04-03-github-the-uphill-climb-of-making-diff-lines-performant — JS-runtime / V8-heap flavor. GitHub's PR Files-changed tab v2 replaced O(n) .find() scans with O(1) JavaScript Map lookups (patterns/constant-time-state-map): commentsMap['path'][L]. Same lesson at the JS-runtime substrate — one hashed dereference beats an array loop on the hot path (per-render state checks across 10,000 diff lines). The pattern also enabled patterns/conditional-child-state-scoping, since cheap per-line activation checks are only viable when the lookup itself is O(1).
sources/2026-03-03-netflix-optimizing-recommendation-systems-with-jdks-vector-api — Java-heap / SIMD-substrate flavor. Netflix's Ranker serendipity-scoring hot path regressed ~5% when reshaped from nested scalar loops to a batched matrix multiply built on double[][] — nested Java arrays have a pointer per row, rows aren't contiguous, walking across rows is pointer-chasing, and SIMD load/store can't operate directly on nested arrays. The fix was flat row-major double[] buffers reused via ThreadLocal<BufferHolder>, which delivered contiguous memory access + eliminated per-batch GC churn + enabled zero-copy SIMD loads for the JDK Vector API kernel. Cache locality at the JVM-heap tier is the enabling substrate — without it, the SIMD kernel swap wouldn't have won. Same lesson as the Cloudflare trie-hard case at a different memory tier (Java heap vs. L1/L2 contiguous allocation).

concepts/partition-colocation — the Kafka-Streams-layer property that delivers cross-topic cache locality.
concepts/locality-aware-scheduling — the scheduler-layer analog (place compute where the data lives).
patterns/colocation-sharding — the DB-layer analog (place related rows on the same shard).
systems/kafka-streams