CONCEPT Cited by 3 sources

Cache hit rate¶

Definition¶

The cache hit rate is the fraction of data requests served from the cache without needing to go to the slower backing store:

hit_rate = (cache_hits / total_requests) × 100

A cache hit is a request whose answer is already in the cache. A cache miss requires fetching from the backing store, typically populating the cache on the way back out. The miss rate is 1 - hit_rate.

Why it's the single load-bearing metric¶

The economic argument for a cache is "keep frequently-accessed data in fast storage so most requests never touch the slow tier." The measurable evidence that this is working is the hit rate. (Source: sources/2025-07-08-planetscale-caching.)

Every other cache metric derives from it:

Effective average latency ≈ hit_rate × fast_latency + (1 - hit_rate) × slow_latency. A 90% hit rate already collapses most of the latency gap; a 99% hit rate essentially removes it.
Backing-store load is proportional to the miss rate — caching is also a capacity-protection mechanism for the slow tier, not just a latency optimisation.
Storage cost is proportional to cache size, which is what you pay to raise the hit rate on a given access pattern.

Hit rate vs cache size¶

From the Dicken post, visualised across two demo setups:

Small cache + random access pattern → low hit rate; most requests miss.
Cache approaching the size of the hot dataset → high hit rate; most requests hit.

"Increasing the size of our cache increases cost and complexity in our data storage system. It's all about trade-offs." (Source: sources/2025-07-08-planetscale-caching.)

The relationship is not linear in real workloads — hot data is typically Pareto-distributed (the top ~20% of keys take ~80% of the traffic, or steeper). A modestly-sized cache that covers the head of the distribution can deliver a very high hit rate; doubling the cache past that point buys diminishing returns.

What affects hit rate at a fixed cache size¶

Access pattern. Sequential or recency-biased workloads have natural locality and high hit rates. Uniformly random access over a larger-than-cache dataset gets hit-rate ≈ cache_size / dataset_size and can't be fixed by any eviction policy.
Eviction policy. At fixed cache size, the choice between FIFO, LRU, time-aware LRU, LFRU, etc. changes hit rate. LRU is the industry-default because it aligns with temporal-locality in most workloads.
Working-set fit. If the working set (data accessed over a given window) fits in cache, hit rate can approach 100% once warm; if it doesn't, hit rate plateaus at cache_size / working_set_size.
Warm-up state. Hit rate is low on a cold cache after a restart / deploy / rebalance; cache-warming strategies matter for tail latency during these windows.

Where to read it in production¶

Redis: INFO stats — keyspace_hits, keyspace_misses.
Postgres: pg_stat_database — blks_hit, blks_read (buffer-pool hit rate). See concepts/postgres-shared-buffers-double-buffering.
MySQL InnoDB: SHOW ENGINE INNODB STATUS — Buffer pool hit rate, also Innodb_buffer_pool_reads vs Innodb_buffer_pool_read_requests. See concepts/innodb-buffer-pool.
CloudFront / CDN: CacheHitRate dashboard metric.
Linux page cache: indirect via vmstat / iostat (rising disk I/O with stable working set implies cache-miss storm). See concepts/linux-page-cache.

Anti-patterns¶

Staring at hit rate without workload context. A hit rate of 50% is excellent for some workloads (random access over a large cold dataset) and terrible for others (recency-biased social feed). Compare to the baseline cache_size / working_set_size.
Optimising hit rate at the cost of freshness. A longer TTL raises the hit rate but risks serving stale data — the cache TTL staleness dilemma.
Hit rate per instance vs fleet hit rate. If the routing substrate doesn't preserve cache locality, each instance's cache sees a random slice of keys and the per-instance hit rate is far below the theoretical ceiling.

Seen in¶

sources/2025-07-08-planetscale-caching — Ben Dicken (PlanetScale) frames hit rate as the single most important cache metric ("We want to keep the hit rate as high as possible. Doing so means we are minimizing the number requests to the slower storage") with worked visual examples of low-hit-rate vs high-hit-rate scenarios.
sources/2025-11-06-slack-build-better-software-to-build-software-better — canonical zero-hit-rate worked example at the build- system altitude: Slack's Quip/Canvas Bazel build's hit rate was zero because every cached action's cache key included the full Python-backend sources as transitive inputs, 2-3 of which always changed per commit. Framed as "every cached 'function' we tried to call had 100 parameters, 2-3 of which always changed." The fix was not bigger cache or better eviction — it was removing parameters from the key (topology fix: sever the Python↔TypeScript dependency edge). Generalises to: when hit rate is structurally zero, neither cache size nor eviction policy can help.
sources/2026-05-14-github-from-latency-to-instant-modernizing-github-issues-navigation-performance — canonical cache-hit-ratio-as-viability-threshold + raised- by-preheating worked example at the browser-cache altitude. GitHub set a ~30 % cache-hit-ratio floor (derived from prior revisit-pattern analysis) as the viability threshold for the issues#show perf rewrite before committing to building the IndexedDB-backed cache. Post-launch observed ratio was ~33 % — clearing the floor and validating the architecture. After preheating from high-intent surfaces (issue lists, dashboards, projects) was added, cache-hit ratio climbed to ~96 %, with the React-soft-nav instant share moving from 4 % → 22 % (cache only) → ~70 % (cache + preheating). Two transferrable lessons: (1) set a numerical cache-hit floor as a build-it gate before committing to a client cache architecture; (2) preheating is the highest-leverage way to raise cache-hit ratio without proportionally raising network volume because it no-ops on cache hit, unlike eager prefetch which always fires.

concepts/cache-locality — the placement property that lets a per-node cache reach high hit rates.
concepts/temporal-locality-recency-bias — the natural access-pattern property most hit rates depend on.
concepts/storage-latency-hierarchy — the latency-gap that caching amortises.
concepts/cache-ttl-staleness-dilemma — the freshness-vs-hit-rate trade-off.
patterns/pair-fast-small-cache-with-slow-large-storage — the architectural shape hit rate measures.
concepts/preheating — cache-population discipline that raises hit rate; canonical 33 % → 96 % move on GitHub Issues issues#show.
concepts/stale-while-revalidate-cache — paired cache-design primitive at altitudes where hit-rate is load-bearing.