CONCEPT Cited by 1 source
Cache hit rate¶
Definition¶
The cache hit rate is the fraction of data requests served from the cache without needing to go to the slower backing store:
A cache hit is a request whose answer is already in the
cache. A cache miss requires fetching from the backing
store, typically populating the cache on the way back out. The
miss rate is 1 - hit_rate.
Why it's the single load-bearing metric¶
The economic argument for a cache is "keep frequently-accessed data in fast storage so most requests never touch the slow tier." The measurable evidence that this is working is the hit rate. (Source: sources/2025-07-08-planetscale-caching.)
Every other cache metric derives from it:
- Effective average latency ≈
hit_rate × fast_latency + (1 - hit_rate) × slow_latency. A 90% hit rate already collapses most of the latency gap; a 99% hit rate essentially removes it. - Backing-store load is proportional to the miss rate — caching is also a capacity-protection mechanism for the slow tier, not just a latency optimisation.
- Storage cost is proportional to cache size, which is what you pay to raise the hit rate on a given access pattern.
Hit rate vs cache size¶
From the Dicken post, visualised across two demo setups:
- Small cache + random access pattern → low hit rate; most requests miss.
- Cache approaching the size of the hot dataset → high hit rate; most requests hit.
"Increasing the size of our cache increases cost and complexity in our data storage system. It's all about trade-offs." (Source: sources/2025-07-08-planetscale-caching.)
The relationship is not linear in real workloads — hot data is typically Pareto-distributed (the top ~20% of keys take ~80% of the traffic, or steeper). A modestly-sized cache that covers the head of the distribution can deliver a very high hit rate; doubling the cache past that point buys diminishing returns.
What affects hit rate at a fixed cache size¶
- Access pattern. Sequential or recency-biased workloads
have natural locality and high hit rates. Uniformly random
access over a larger-than-cache dataset gets hit-rate ≈
cache_size / dataset_sizeand can't be fixed by any eviction policy. - Eviction policy. At fixed cache size, the choice between FIFO, LRU, time-aware LRU, LFRU, etc. changes hit rate. LRU is the industry-default because it aligns with temporal-locality in most workloads.
- Working-set fit. If the
working set (data accessed
over a given window) fits in cache, hit rate can approach
100% once warm; if it doesn't, hit rate plateaus at
cache_size / working_set_size. - Warm-up state. Hit rate is low on a cold cache after a restart / deploy / rebalance; cache-warming strategies matter for tail latency during these windows.
Where to read it in production¶
- Redis:
INFO stats—keyspace_hits,keyspace_misses. - Postgres:
pg_stat_database—blks_hit,blks_read(buffer-pool hit rate). See concepts/postgres-shared-buffers-double-buffering. - MySQL InnoDB:
SHOW ENGINE INNODB STATUS—Buffer pool hit rate, alsoInnodb_buffer_pool_readsvsInnodb_buffer_pool_read_requests. See concepts/innodb-buffer-pool. - CloudFront / CDN:
CacheHitRatedashboard metric. - Linux page cache: indirect via
vmstat/iostat(rising disk I/O with stable working set implies cache-miss storm). See concepts/linux-page-cache.
Anti-patterns¶
- Staring at hit rate without workload context. A hit rate
of 50% is excellent for some workloads (random access over a
large cold dataset) and terrible for others (recency-biased
social feed). Compare to the baseline
cache_size / working_set_size. - Optimising hit rate at the cost of freshness. A longer TTL raises the hit rate but risks serving stale data — the cache TTL staleness dilemma.
- Hit rate per instance vs fleet hit rate. If the routing substrate doesn't preserve cache locality, each instance's cache sees a random slice of keys and the per-instance hit rate is far below the theoretical ceiling.
Seen in¶
- sources/2025-07-08-planetscale-caching — Ben Dicken (PlanetScale) frames hit rate as the single most important cache metric ("We want to keep the hit rate as high as possible. Doing so means we are minimizing the number requests to the slower storage") with worked visual examples of low-hit-rate vs high-hit-rate scenarios.
Related¶
- concepts/cache-locality — the placement property that lets a per-node cache reach high hit rates.
- concepts/temporal-locality-recency-bias — the natural access-pattern property most hit rates depend on.
- concepts/storage-latency-hierarchy — the latency-gap that caching amortises.
- concepts/cache-ttl-staleness-dilemma — the freshness-vs-hit-rate trade-off.
- patterns/pair-fast-small-cache-with-slow-large-storage — the architectural shape hit rate measures.