PLANETSCALE Tier 3

PlanetScale — Caching¶

Summary¶

Ben Dicken (PlanetScale) pedagogical deep-dive on caching as "the most elegant, powerful, and pervasive innovation in computing" — the core principle across CPU L1/L2/L3, RAM, CDNs, and database buffer pools of pairing a small amount of expensive fast storage with a large amount of cheap slow storage, keeping frequently- accessed data in the fast tier. The post walks through: the hit- rate formula and how cache size affects it, temporal locality (recency bias) with X.com tweets as the worked example, spatial locality with photo-album prefetching, geographic caching via CDNs, four eviction policies (FIFO, LRU, time-aware LRU, Least-Frequently- Recently-Used with dual-queue promotion/demotion), and finally the double-buffering cache stack inside real OLTP databases — Postgres's shared_buffers (canonically ~25% of RAM) over the OS filesystem page cache, and MySQL's equivalent InnoDB buffer pool. Architecturally dense despite its pedagogical framing; first canonical wiki treatment of CPU-cache hierarchy + cache eviction policies + Postgres shared_buffers double-buffering.

Key takeaways¶

Caching is one idea applied at every tier of the storage hierarchy. "For a given budget, you can either get a large amount of slower data storage, or a small amount of faster storage. Engineers get around this by combining the two." (Source: sources/2025-07-08-planetscale-caching.) CPU L1 over RAM over hard drive, RAM over disk, CDN edge over origin, database buffer pool over SSD — same shape, different stack level. Canonicalised as patterns/pair-fast-small-cache-with-slow-large-storage.
Hit rate is the single load-bearing metric. "hit_rate = (cache_hits / total_requests) x 100. We want to keep the hit rate as high as possible." Small cache + random access pattern → low hit rate. Large cache (approaching the data size) + same pattern → high hit rate. "Increasing the size of our cache increases cost and complexity… It's all about trade-offs." Canonical wiki definition of concepts/cache-hit-rate.
CPU cache hierarchy is its own multi-tier substrate. "Modern CPUs have one or more cache layers for RAM. Though RAM is fast, a cache built directly into the CPU is even faster… L1 is faster than L2 which is faster than L3, but L1 has less capacity than L2, which has less capacity than L3." (Source: sources/2025-07-08-planetscale-caching.) The canonical framing: "Faster data lookup means more cost or more size limitations due to how physically close the data needs to be to the requester." First canonical wiki treatment of the CPU cache hierarchy — concepts/cpu-cache-hierarchy; sibling page to the storage-side storage latency hierarchy.
Recency bias drives temporal-locality caching at web scale. "The number of tweets in the entire history of X.com is easily in the trillions. However, 𝕏 timelines are almost exclusively filled with posts from the past ~48 hours." Worked example — Karpathy tweet "received over 43,000 likes and 7 million impressions since it was posted over two years ago" — views trail-off curve makes recent content hot, older content cold. "These websites store much of their content (email, images, documents, videos, etc) in 'slow' storage (like Amazon S3 or similar), but cache recent content in faster, in-memory stores (like CloudFront, Redis or Memcached)." Canonical wiki statement of concepts/temporal-locality-recency-bias.
Spatial locality enables predictive prefetching. "In some data storage systems, when one chunk of data is read, it's probable that the data that comes immediately 'before' or 'after' it will also be read. Consider a photo album app. When a user clicks on one photo from their cloud photo storage, it's likely that the next photo they will view is the photo taken immediately after it chronologically. In these situations, the data storage and caching systems leverage this user behavior. When one photo is loaded, we can predict which ones we think they will want to see next, and prefetch those into the cache as well." Canonicalised as concepts/spatial-locality-prefetching + patterns/spatial-prefetch-on-access.
CDNs apply the same caching idea to the planet. "We live on a big spinning rock 25,000 miles in circumference, and we are limited by 'physics' for how fast data can move from point A to B." East-coast US origin: "East-coasters will experience 10-20ms of latency, while west-coasters will experience 50-100ms. Those requesting data on the other side of the world will experience 250+ milliseconds of latency." CDN shape: "we still have a single source-of-truth for the data, but we add data caches in multiple locations around the world. Data requests are sent to the geographically-nearest cache. If the cache does not have the needed data, only then does it request it from the core source of truth." Canonicalised as patterns/cdn-edge-cache-over-central-origin.
FIFO is the simplest replacement policy, LRU is the industry default. FIFO "works like a queue. New items are added to the beginning. When the cache queue is full, the least-recently added item gets evicted." Criticism (Source: sources/2025-07-08-planetscale-caching): "While simple to implement, FIFO isn't optimal for most caching scenarios because it doesn't consider usage patterns." LRU "is a popular choice, and the industry standard for many caching systems. Unlike FIFO, LRU always evicts the item that has least-recently been requested, a sensible choice to maximize cache hits. This aligns well with temporal locality in real-world data access patterns." New canonical wiki concepts concepts/fifo-cache-eviction + concepts/lru-cache-eviction.
Time-aware LRU adds a per-entry expiration timer. "LRU plus giving each element in the cache a timer. When time is up, we evict the element!" Worked examples: "For a social network, automatically evicting posts from the cache after 48 hours. For a weather app, automatically evicting previous-days weather info from the cache when the clock turns to a new day. In an email app, removing the email from the cache after a week, as it's unlikely to be read again after that." Canonicalised as concepts/time-aware-lru-cache-eviction.
Least-Frequently-Recently-Used (LFRU) uses two queues with promotion/demotion. "One cool one is Least-Frequently Recently Used. This involves managing two queues, one for high-priority items and one for low priority items. The high-priority queue uses an LRU algorithm, and when an element needs to be evicted it gets moved to the low priority queue, which then uses a more aggressive algorithm for replacement." Canonicalised as concepts/lfru-cache-eviction.
Postgres shared_buffers sits on top of the OS filesystem page cache — a double-buffering stack. "Postgres implements a two-layer caching strategy. First, it uses shared_buffers, an internal cache for data pages that store table information. This keeps frequently read row data in memory while less-frequently accessed data stays on disk. Second, Postgres relies heavily on the operating system's filesystem page cache, which caches disk pages at the kernel level. This creates a double-buffering system where data can exist in both Postgres's shared_buffers and the OS page cache." Canonical sizing rule: "Many deployments set shared_buffers to around 25% of available RAM and let the filesystem cache handle much of the remaining caching work." Canonicalised as concepts/postgres-shared-buffers-double-buffering; complements prior Linux page cache page.
MySQL InnoDB buffer pool does the same job in one layer. "MySQL does a similar thing with the buffer pool. Like Postgres, this is an internal cache to keep recently used data in RAM." Extended datapoint for InnoDB buffer pool page — explicit positioning alongside Postgres's two-layer stack.
Database caches are more complex than pure caches because of ACID. "Arguably, these are more complex than a 'regular' cache as they also have to be able to operate with full ACID semantics and database transactions. Both databases have to take careful measures to ensure these pages contain accurate information and metadata as the data evolves." Dicken explicitly flags write + update handling in caching systems + consistency issues + sharded caches as "subject[s] we completely avoided."

Operational numbers¶

Postgres shared_buffers canonical sizing: "Many deployments set shared_buffers to around 25% of available RAM." The rest of RAM serves the OS page cache. (Source: sources/2025-07-08-planetscale-caching.)
East-coast US origin latency stratification (order of magnitude, no percentile): east-coast clients ~10–20 ms, west- coast clients ~50–100 ms, other-side-of-the-world clients 250+ ms.
Karpathy tweet engagement curve: 43,000 likes + 7M impressions over two years — author's worked illustration of head-heavy recency bias ("Tweets, pictures, and videos that were recently posted are requested much more than older ones.")

Systems mentioned¶

Amazon S3 — named as canonical example of "slow" large backing storage for website content.
Amazon CloudFront — named as canonical CDN + in-memory edge cache.
Redis / Memcached — named as canonical in-memory database caches for recent content.
PostgreSQL — two-layer caching strategy: shared_buffers + OS filesystem page cache.
MySQL / InnoDB — buffer pool as the equivalent caching layer.

Caveats¶

Pedagogy not a retrospective. No production numbers from PlanetScale (no hit-rate data from their MySQL fleet, no buffer-pool sizing vs workload retrospective, no comparative latency with/without shared_buffers tuning). The concrete sizing rule (25% of RAM to shared_buffers) is a widely quoted convention, not a Dicken measurement.
Post explicitly scope-bounded. "This article barely scratches the surface of caching. We completely avoided the subject of handling writes and updates in caching systems. We discussed very little of specific technologies used for caching like Redis, Memcached, and others. We didn't address consistency issues, sharded caches, and a lot of other fun details." Wiki treatment of write-through / write-back / cache-aside / invalidation lives on other source pages (concepts/write-through-cache, concepts/invalidation-based-cache, concepts/cache-ttl-staleness-dilemma).
No specific hardware numbers — no explicit L1/L2/L3 cycle counts, no per-tier capacity numbers (kB/MB range is implied but not tabulated). See also the IO devices and latency companion which does tabulate the storage-tier numbers.
CDN framing elides invalidation. "If the data never changes, we can avoid cache misses entirely! The only time we would need to get them is when the original data is modified." The post doesn't discuss how modifications propagate — purge lists, tag-based invalidation, short-TTL revalidation. See concepts/invalidation-based-cache + concepts/cache-ttl-staleness-dilemma.
LFRU is gestured at, not specified. Dual-queue shape described; demotion trigger, promotion criterion, and the "more aggressive algorithm" on the low-priority queue all abstract.

Source¶

Original: https://planetscale.com/blog/caching
Raw markdown: raw/planetscale/2025-07-08-caching-61a6e147.md