Skip to content

PATTERN Cited by 1 source

Async-refresh cache loader

Problem

A plain TTL cache has exactly two states per key: fresh or expired. At the TTL boundary, the next request synchronously pays the origin round-trip latency — a guaranteed tail-latency spike. Under concurrent load, multiple readers all miss and all invoke the origin (cache stampede).

For a single-digit-millisecond latency budget on a high-QPS read API, both the expiry-spike and the stampede matter.

Pattern

Use a cache primitive that separates read TTL from refresh trigger:

  • The cache is authoritative (served synchronously) for the full TTL.
  • Reads that arrive within a trailing stale window (e.g. last 15 s of a 60 s TTL) serve the cached value immediately and enqueue a background refresh against the origin.
  • Refreshes are deduplicated — at most one in-flight background refresh per key.
  • When the refresh completes, the cache entry is replaced with a fresh TTL.

The canonical Java implementation is Caffeine's AsyncLoadingCache.refreshAfterWrite:

Caffeine.newBuilder()
  .expireAfterWrite(60, SECONDS)
  .refreshAfterWrite(45, SECONDS) // refresh in last 15s
  .buildAsync(loader);

Zalando's PRAPI runs exactly this on top of DynamoDB. (Source: sources/2025-03-06-zalando-from-event-driven-chaos-to-a-blazingly-fast-serving-api.)

See concepts/async-loading-cache-stale-window for the fuller concept treatment and concepts/stale-while-revalidate-cache for the broader SWR semantic.

When it's the right choice

  • Latency budget is tighter than origin round-trip. You can't afford to pay the origin on a TTL boundary.
  • Data is bounded-stale tolerant. Serving a 30-second-old value is acceptable.
  • High-traffic keys keep the refresh fed. Background refresh depends on reads in the stale window triggering it; cold keys won't refresh automatically.

When it's wrong

  • Strict freshness required. If the consumer must see the latest write within milliseconds, SWR-style caching is incompatible; use write- through or invalidation instead.
  • Low-traffic keys. A rarely-read key never gets a stale-window hit — every access after TTL elapses is a hard miss and still pays origin latency. For sparse keys, add a scheduled refresh-ahead job or accept the miss.
  • Origin can't absorb unsolicited refresh traffic. If the origin is fragile, unsolicited background reads for idle keys may hurt it.

Failure modes to monitor

  • Refresh failure silently stale. If the background refresh throws, the cache entry can hit full TTL and then take the synchronous miss anyway. Emit a refresh-failure counter.
  • Loader latency exceeding stale window. If the origin is slower than your stale-window budget, entries regularly expire before refresh completes. Alarm on refresh- completion-time relative to stale window.
  • Stale window too tight. The stale window must cover at least one expected hit-period per key; too tight and most keys still expire synchronously.

Seen in

Last updated · 501 distilled / 1,218 read