Skip to content

CONCEPT Cited by 1 source

Spatial locality (prefetching)

Definition

Spatial locality is the property that data stored near (physically adjacent, sequentially indexed, temporally contiguous) a just-accessed item is more likely to be accessed next. Unlike temporal locality (the same item is re-accessed), spatial locality predicts nearby items.

Why it matters

When spatial locality holds, cache and storage systems can prefetch — load neighbouring items at access time — so subsequent accesses hit instead of miss. This is a one-miss- amortised-over-many-hits pattern.

Ben Dicken's worked example (Source: sources/2025-07-08-planetscale-caching):

Consider a photo album app. When a user clicks on one photo from their cloud photo storage, it's likely that the next photo they will view is the photo taken immediately after it chronologically. In these situations, the data storage and caching systems leverage this user behavior. When one photo is loaded, we can predict which ones we think they will want to see next, and prefetch those into the cache as well. In this case, we would also load the next and previous few photos.

Visualisation from the post: each database cell is numbered in sequence; clicking one loads that cell + its two neighbours into cache.

Tiers at which spatial locality lives

Every layer of the caching stack has a spatial-locality analogue:

  • CPU cache line — the L1/L2/L3 cache doesn't store individual bytes; it stores 64-byte (typical) cache lines. Accessing byte x pulls bytes x+1…x+63 into cache too, cheaply. Sequential memory walks hit; random- access walks miss on every new cache line. See concepts/cpu-cache-hierarchy + concepts/cache-locality.
  • Disk block / page — disks and SSDs transfer in fixed-size blocks (typically 4 KB filesystem blocks or 16 KB InnoDB pages). One byte of an in-demand page pulls the whole page into page cache. Sequential scans benefit far more than random seeks.
  • Database range scans / index traversalB+tree leaves are linked; range scans walk sequential leaf pages with good spatial locality (one I/O per page amortised over many rows). Random-PK inserts destroy this property — see concepts/uuid-primary-key-antipattern.
  • OS readahead — Linux's readahead heuristic detects sequential I/O and prefetches the next N pages before the application asks.
  • HDD sequential vs random I/O — sequential reads are orders of magnitude faster than random because there's no seek between adjacent blocks. See concepts/hdd-sequential-io-optimization.
  • Photo album / content feed — the Dicken example. Prefetch next-N + previous-N items; first visible on low- bandwidth mobile where swiping forward looks instant.

Cost model

Prefetching is worth it only when:

  1. Hit probability of prefetched items is high (spatial locality actually holds for this workload).
  2. Marginal cost (cache slot + fetch bandwidth) is low relative to the miss cost it avoids.
  3. Eviction pressure from prefetched items doesn't harm hit rate on other hot data.

Over-aggressive prefetch on a random-access workload is worse than no prefetch — it costs fetch bandwidth and evicts genuinely hot data for speculative loads that never hit.

Relationship to temporal locality

Spatial and temporal locality are independent axes:

Workload Temporal locality Spatial locality Example
Social feed (head) High Low Twitter timeline — same item re-read, but adjacent items rarely co-read
Photo album Low High Next photo often viewed; same photo rarely re-viewed
Database range scan Low High Sequential leaves; each row read once
OLTP hot-row write High Low Same row written repeatedly
Memory-mapped binary High High Hot code executes adjacent bytes, repeatedly

Different caches must rely on different signals. LRU alone exploits temporal locality; sequential prefetch alone exploits spatial locality; production caches use both.

Seen in

  • sources/2025-07-08-planetscale-caching — Ben Dicken's photo-album worked example + "load the next and previous few photos" prefetch rule. Canonical application-tier instance of spatial locality as a caching substrate assumption.
Last updated · 319 distilled / 1,201 read