Skip to content

CONCEPT Cited by 1 source

Cache hit rate

Definition

The cache hit rate is the fraction of data requests served from the cache without needing to go to the slower backing store:

hit_rate = (cache_hits / total_requests) × 100

A cache hit is a request whose answer is already in the cache. A cache miss requires fetching from the backing store, typically populating the cache on the way back out. The miss rate is 1 - hit_rate.

Why it's the single load-bearing metric

The economic argument for a cache is "keep frequently-accessed data in fast storage so most requests never touch the slow tier." The measurable evidence that this is working is the hit rate. (Source: sources/2025-07-08-planetscale-caching.)

Every other cache metric derives from it:

  • Effective average latencyhit_rate × fast_latency + (1 - hit_rate) × slow_latency. A 90% hit rate already collapses most of the latency gap; a 99% hit rate essentially removes it.
  • Backing-store load is proportional to the miss rate — caching is also a capacity-protection mechanism for the slow tier, not just a latency optimisation.
  • Storage cost is proportional to cache size, which is what you pay to raise the hit rate on a given access pattern.

Hit rate vs cache size

From the Dicken post, visualised across two demo setups:

  • Small cache + random access pattern → low hit rate; most requests miss.
  • Cache approaching the size of the hot dataset → high hit rate; most requests hit.

"Increasing the size of our cache increases cost and complexity in our data storage system. It's all about trade-offs." (Source: sources/2025-07-08-planetscale-caching.)

The relationship is not linear in real workloads — hot data is typically Pareto-distributed (the top ~20% of keys take ~80% of the traffic, or steeper). A modestly-sized cache that covers the head of the distribution can deliver a very high hit rate; doubling the cache past that point buys diminishing returns.

What affects hit rate at a fixed cache size

  • Access pattern. Sequential or recency-biased workloads have natural locality and high hit rates. Uniformly random access over a larger-than-cache dataset gets hit-rate ≈ cache_size / dataset_size and can't be fixed by any eviction policy.
  • Eviction policy. At fixed cache size, the choice between FIFO, LRU, time-aware LRU, LFRU, etc. changes hit rate. LRU is the industry-default because it aligns with temporal-locality in most workloads.
  • Working-set fit. If the working set (data accessed over a given window) fits in cache, hit rate can approach 100% once warm; if it doesn't, hit rate plateaus at cache_size / working_set_size.
  • Warm-up state. Hit rate is low on a cold cache after a restart / deploy / rebalance; cache-warming strategies matter for tail latency during these windows.

Where to read it in production

  • Redis: INFO statskeyspace_hits, keyspace_misses.
  • Postgres: pg_stat_databaseblks_hit, blks_read (buffer-pool hit rate). See concepts/postgres-shared-buffers-double-buffering.
  • MySQL InnoDB: SHOW ENGINE INNODB STATUSBuffer pool hit rate, also Innodb_buffer_pool_reads vs Innodb_buffer_pool_read_requests. See concepts/innodb-buffer-pool.
  • CloudFront / CDN: CacheHitRate dashboard metric.
  • Linux page cache: indirect via vmstat / iostat (rising disk I/O with stable working set implies cache-miss storm). See concepts/linux-page-cache.

Anti-patterns

  • Staring at hit rate without workload context. A hit rate of 50% is excellent for some workloads (random access over a large cold dataset) and terrible for others (recency-biased social feed). Compare to the baseline cache_size / working_set_size.
  • Optimising hit rate at the cost of freshness. A longer TTL raises the hit rate but risks serving stale data — the cache TTL staleness dilemma.
  • Hit rate per instance vs fleet hit rate. If the routing substrate doesn't preserve cache locality, each instance's cache sees a random slice of keys and the per-instance hit rate is far below the theoretical ceiling.

Seen in

  • sources/2025-07-08-planetscale-caching — Ben Dicken (PlanetScale) frames hit rate as the single most important cache metric ("We want to keep the hit rate as high as possible. Doing so means we are minimizing the number requests to the slower storage") with worked visual examples of low-hit-rate vs high-hit-rate scenarios.
Last updated · 319 distilled / 1,201 read