Skip to content

PATTERN Cited by 1 source

Spatial prefetch on access

Definition

When a data item is accessed, speculatively load neighbouring items into the cache as well — on the assumption that spatial locality holds for the workload, so the next accesses will be adjacent to this one. A single miss pays for the initial fetch + extra neighbours; the subsequent nearby accesses all hit.

Canonical framing

Ben Dicken's 2025-07-08 photo-album example (Source: sources/2025-07-08-planetscale-caching):

When a user clicks on one photo from their cloud photo storage, it's likely that the next photo they will view is the photo taken immediately after it chronologically. In these situations, the data storage and caching systems leverage this user behavior. When one photo is loaded, we can predict which ones we think they will want to see next, and prefetch those into the cache as well.

And the generalisation:

This prefetching of related data improves performance when there are predictable data access patterns, which is true of many applications beyond photo albums.

Where this pattern appears

Tier Prefetch unit Trigger
CPU hardware Cache line (64 B) Memory access — the HW fetches the surrounding 63 B for free
CPU predictive Next line / stride Hardware prefetcher detects strided access pattern
OS readahead Next N filesystem pages read() detected as sequential; OS pulls ahead
Page cache prefetch Adjacent file pages madvise(MADV_SEQUENTIAL) / posix_fadvise() hints
Database range scan Next B+tree leaf pages Ordered scan detected; storage engine walks the linked-list of leaves
Application prefetch Next/prev N items Photo album, video thumbnails, pagination
CDN / browser Speculative next-page fetch Predictive based on user's navigation pattern

Implementation choices

  • Static prefetch width. Always fetch N neighbours on every access. Cheap, predictable, sometimes wrong. The Dicken photo-album example: "load the next and previous few photos".
  • Stride-detecting prefetch. Observe access order; if it's monotonic, widen the prefetch window. Hardware CPU prefetchers do this natively.
  • Content-aware prefetch. Use application signals — user's chronological photo order, a client's paginated cursor — to choose what counts as "neighbouring."
  • Model-driven prefetch. ML-predicted likely-next items (patterns/asset-preload-prediction). Costly but useful on high-value hot paths.

When prefetching is net-negative

Prefetching is worth it only when the prefetched hit probability is high enough to amortise the extra fetch and cache-slot cost.

  • Uniformly random access — neighbours are no more likely than anything else. Prefetch wastes bandwidth and pollutes cache.
  • Narrow working set that already fits — nothing to save; prefetching only adds overhead.
  • Under eviction pressure — each prefetched item evicts a genuine hot item. If temporal locality is already strong, that's a net loss.
  • Expensive-to-fetch neighbours — high-resolution images, encrypted payloads, cross-region fetches. Even high prefetch-hit probability may not pay back the per-fetch cost.

Seen in

  • sources/2025-07-08-planetscale-caching — Ben Dicken's photo-album example + the visual demo where each database cell click loads that cell + its two neighbours into cache. Canonical application-tier instance of this pattern with explicit generalisation beyond photos.
  • Hardware CPU prefetcher (implicit across the wiki's performance corpus) — every cache-locality-sensitive post on the wiki benefits from HW prefetch on sequential access; see the Cloudflare trie-hard + Netflix Vector API case studies.
  • Linux OS readahead (implicit) — mmap-heavy workloads tune fadvise flags explicitly; sequential scans benefit from kernel-default readahead.
Last updated · 319 distilled / 1,201 read