PATTERN Cited by 1 source

Pair fast-small cache with slow-large backing storage¶

Definition¶

The foundational caching pattern — applicable at every layer of every computing system: pair a small amount of expensive fast storage with a large amount of cheap slow storage, and keep the frequently-accessed subset in the fast tier. The slow tier remains the source of truth; the fast tier is a best-effort acceleration layer that can be rebuilt on demand.

Canonical framing¶

From Ben Dicken's 2025-07-08 Caching post (Source: sources/2025-07-08-planetscale-caching):

When designing systems to store and retrieve data, we are faced with trade-offs between capacity, speed, cost, and durability.

For a given budget, you can either get a large amount of slower data storage, or a small amount of faster storage. Engineers get around this by combining the two: Pair a large amount of cheaper slow storage with a small amount of expensive fast storage. Data that is accessed more frequently can go in the fast storage, and the other data that we use less often goes in the slow storage.

This is the core principle of caching.

The access flow:

Request data → check fast tier first.
Cache hit → serve from fast tier (fast path).
Cache miss → fetch from slow tier → populate fast tier on the way back → serve.

Why it works¶

Three underlying assumptions — all usually (not always) true in real workloads:

Access distribution is skewed. A small subset of the data accounts for most of the requests ([[concepts/ temporal-locality-recency-bias|temporal locality]], Pareto / Zipfian distributions).
Fast-tier cost scales superlinearly with size. Double the SRAM ≈ more-than-double the cost; double the DRAM, same story up to bus-lane limits. Making the fast tier "as big as the data" is not economical.
Miss cost is bounded. A miss pays the slow-tier latency plus the fast-tier-population cost — if the miss cost exceeds the hit savings integrated over the entry's lifetime, caching is net-negative for that key.

Where this shape appears on the wiki¶

CPU cache (L1/L2/L3) over RAM — the hardware tier. ~1-cycle L1, ~300-cycle RAM, 1000× gap.
RAM over disk — the classical OS tier. ~100 ns RAM, ~50 μs NVMe, ~500× gap.
InnoDB buffer pool / Postgres shared_buffers — the database tier. In-process page cache in front of on-disk B+trees.
Linux page cache — the kernel tier. OS-managed in front of the filesystem.
Redis / Memcached in front of a relational DB — the application tier. In-memory KV store in front of a heavier persistent store.
CDN edge over central origin — the network tier. Regional PoPs in front of a single source-of-truth region.
Warehouse query result cache — the analytics tier. Materialized recent query results in front of cold compute.

Ben Dicken's framing: "We can see this visualized below. Each colored square represents a unique datum we need to store and retrieve. You can click on the 'data requester' to send additional requests for data or click on any of the colored squares in the 'slow + large' section to initiate a request [for that] color." The same two-tier picture, at every scale the post walks through.

Architectural invariants¶

Slow tier is the source of truth. The fast tier is rebuildable on demand; any divergence must resolve in the slow tier's favour. (This is the central distinction between a cache and a database replica.)
Miss must populate (in a pull-through model) — or fast tier is write-through-updated (in a push model).
Eviction happens in the fast tier under capacity pressure; see LRU and siblings.
Consistency window between tiers must be bounded (either by TTL, by invalidation, or by write-through ordering).

Known failure modes¶

Hit rate collapses when access pattern is uniformly random or adversarially crafted. See concepts/cache-hit-rate.
Cache stampede / thundering herd on cold start or mass eviction — many requests miss simultaneously and all hit the slow tier.
Staleness if invalidation is imperfect — the cache TTL staleness dilemma.
Cost inversion on write-heavy workloads — maintaining cache consistency can cost more than the read saving earns.
Cache-locality loss in distributed caches when keys are sprayed across nodes — per-node hit rate crashes.

Seen in¶

sources/2025-07-08-planetscale-caching — Ben Dicken (PlanetScale) canonicalises this shape as "the core principle of caching" and walks through its appearance at every tier from CPU L1 to CDN to database buffer pool.