CONCEPT Cited by 1 source

Spatial locality (prefetching)¶

Definition¶

Spatial locality is the property that data stored near (physically adjacent, sequentially indexed, temporally contiguous) a just-accessed item is more likely to be accessed next. Unlike temporal locality (the same item is re-accessed), spatial locality predicts nearby items.

Why it matters¶

When spatial locality holds, cache and storage systems can prefetch — load neighbouring items at access time — so subsequent accesses hit instead of miss. This is a one-miss- amortised-over-many-hits pattern.

Ben Dicken's worked example (Source: sources/2025-07-08-planetscale-caching):

Consider a photo album app. When a user clicks on one photo from their cloud photo storage, it's likely that the next photo they will view is the photo taken immediately after it chronologically. In these situations, the data storage and caching systems leverage this user behavior. When one photo is loaded, we can predict which ones we think they will want to see next, and prefetch those into the cache as well. In this case, we would also load the next and previous few photos.

Visualisation from the post: each database cell is numbered in sequence; clicking one loads that cell + its two neighbours into cache.

Tiers at which spatial locality lives¶

Every layer of the caching stack has a spatial-locality analogue:

CPU cache line — the L1/L2/L3 cache doesn't store individual bytes; it stores 64-byte (typical) cache lines. Accessing byte x pulls bytes x+1…x+63 into cache too, cheaply. Sequential memory walks hit; random- access walks miss on every new cache line. See concepts/cpu-cache-hierarchy + concepts/cache-locality.
Disk block / page — disks and SSDs transfer in fixed-size blocks (typically 4 KB filesystem blocks or 16 KB InnoDB pages). One byte of an in-demand page pulls the whole page into page cache. Sequential scans benefit far more than random seeks.
Database range scans / index traversal — B+tree leaves are linked; range scans walk sequential leaf pages with good spatial locality (one I/O per page amortised over many rows). Random-PK inserts destroy this property — see concepts/uuid-primary-key-antipattern.
OS readahead — Linux's readahead heuristic detects sequential I/O and prefetches the next N pages before the application asks.
HDD sequential vs random I/O — sequential reads are orders of magnitude faster than random because there's no seek between adjacent blocks. See concepts/hdd-sequential-io-optimization.
Photo album / content feed — the Dicken example. Prefetch next-N + previous-N items; first visible on low- bandwidth mobile where swiping forward looks instant.

Cost model¶

Prefetching is worth it only when:

Hit probability of prefetched items is high (spatial locality actually holds for this workload).
Marginal cost (cache slot + fetch bandwidth) is low relative to the miss cost it avoids.
Eviction pressure from prefetched items doesn't harm hit rate on other hot data.

Over-aggressive prefetch on a random-access workload is worse than no prefetch — it costs fetch bandwidth and evicts genuinely hot data for speculative loads that never hit.

Relationship to temporal locality¶

Spatial and temporal locality are independent axes:

Workload	Temporal locality	Spatial locality	Example
Social feed (head)	High	Low	Twitter timeline — same item re-read, but adjacent items rarely co-read
Photo album	Low	High	Next photo often viewed; same photo rarely re-viewed
Database range scan	Low	High	Sequential leaves; each row read once
OLTP hot-row write	High	Low	Same row written repeatedly
Memory-mapped binary	High	High	Hot code executes adjacent bytes, repeatedly

Different caches must rely on different signals. LRU alone exploits temporal locality; sequential prefetch alone exploits spatial locality; production caches use both.

Seen in¶

sources/2025-07-08-planetscale-caching — Ben Dicken's photo-album worked example + "load the next and previous few photos" prefetch rule. Canonical application-tier instance of spatial locality as a caching substrate assumption.

concepts/cache-hit-rate
concepts/temporal-locality-recency-bias
concepts/cpu-cache-hierarchy — same principle at the CPU cache-line layer.
concepts/linux-page-cache
concepts/hdd-sequential-io-optimization
concepts/b-plus-tree — range scans rely on it.
patterns/spatial-prefetch-on-access — the pattern that names this access-on-neighbour shape.