PATTERN

Async-refresh cache loader¶

Problem¶

A plain TTL cache has exactly two states per key: fresh or expired. At the TTL boundary, the next request synchronously pays the origin round-trip latency — a guaranteed tail-latency spike. Under concurrent load, multiple readers all miss and all invoke the origin (cache stampede).

For a single-digit-millisecond latency budget on a high-QPS read API, both the expiry-spike and the stampede matter.

Pattern¶

Use a cache primitive that separates read TTL from refresh trigger:

The cache is authoritative (served synchronously) for the full TTL.
Reads that arrive within a trailing stale window (e.g. last 15 s of a 60 s TTL) serve the cached value immediately and enqueue a background refresh against the origin.
Refreshes are deduplicated — at most one in-flight background refresh per key.
When the refresh completes, the cache entry is replaced with a fresh TTL.

The canonical Java implementation is Caffeine's AsyncLoadingCache.refreshAfterWrite:

Caffeine.newBuilder()
  .expireAfterWrite(60, SECONDS)
  .refreshAfterWrite(45, SECONDS) // refresh in last 15s
  .buildAsync(loader);

Zalando's PRAPI runs exactly this on top of DynamoDB. (Source: .)

See concepts/async-loading-cache-stale-window for the fuller concept treatment and concepts/stale-while-revalidate-cache for the broader SWR semantic.

When it's the right choice¶

Latency budget is tighter than origin round-trip. You can't afford to pay the origin on a TTL boundary.
Data is bounded-stale tolerant. Serving a 30-second-old value is acceptable.
High-traffic keys keep the refresh fed. Background refresh depends on reads in the stale window triggering it; cold keys won't refresh automatically.

When it's wrong¶

Strict freshness required. If the consumer must see the latest write within milliseconds, SWR-style caching is incompatible; use write- through or invalidation instead.
Low-traffic keys. A rarely-read key never gets a stale-window hit — every access after TTL elapses is a hard miss and still pays origin latency. For sparse keys, add a scheduled refresh-ahead job or accept the miss.
Origin can't absorb unsolicited refresh traffic. If the origin is fragile, unsolicited background reads for idle keys may hurt it.

Failure modes to monitor¶

Refresh failure silently stale. If the background refresh throws, the cache entry can hit full TTL and then take the synchronous miss anyway. Emit a refresh-failure counter.
Loader latency exceeding stale window. If the origin is slower than your stale-window budget, entries regularly expire before refresh completes. Alarm on refresh- completion-time relative to stale window.
Stale window too tight. The stale window must cover at least one expected hit-period per key; too tight and most keys still expire synchronously.

Seen in¶

— Zalando PRAPI's 60s / 15s configuration on DynamoDB-backed product data. Load-bearing for their sub-10ms P99 on single GETs.