PATTERN Cited by 1 source
Async-refresh cache loader¶
Problem¶
A plain TTL cache has exactly two states per key: fresh or expired. At the TTL boundary, the next request synchronously pays the origin round-trip latency — a guaranteed tail-latency spike. Under concurrent load, multiple readers all miss and all invoke the origin (cache stampede).
For a single-digit-millisecond latency budget on a high-QPS read API, both the expiry-spike and the stampede matter.
Pattern¶
Use a cache primitive that separates read TTL from refresh trigger:
- The cache is authoritative (served synchronously) for the full TTL.
- Reads that arrive within a trailing stale window (e.g. last 15 s of a 60 s TTL) serve the cached value immediately and enqueue a background refresh against the origin.
- Refreshes are deduplicated — at most one in-flight background refresh per key.
- When the refresh completes, the cache entry is replaced with a fresh TTL.
The canonical Java implementation is Caffeine's
AsyncLoadingCache.refreshAfterWrite:
Caffeine.newBuilder()
.expireAfterWrite(60, SECONDS)
.refreshAfterWrite(45, SECONDS) // refresh in last 15s
.buildAsync(loader);
Zalando's PRAPI runs exactly this on top of DynamoDB. (Source: sources/2025-03-06-zalando-from-event-driven-chaos-to-a-blazingly-fast-serving-api.)
See concepts/async-loading-cache-stale-window for the fuller concept treatment and concepts/stale-while-revalidate-cache for the broader SWR semantic.
When it's the right choice¶
- Latency budget is tighter than origin round-trip. You can't afford to pay the origin on a TTL boundary.
- Data is bounded-stale tolerant. Serving a 30-second-old value is acceptable.
- High-traffic keys keep the refresh fed. Background refresh depends on reads in the stale window triggering it; cold keys won't refresh automatically.
When it's wrong¶
- Strict freshness required. If the consumer must see the latest write within milliseconds, SWR-style caching is incompatible; use write- through or invalidation instead.
- Low-traffic keys. A rarely-read key never gets a stale-window hit — every access after TTL elapses is a hard miss and still pays origin latency. For sparse keys, add a scheduled refresh-ahead job or accept the miss.
- Origin can't absorb unsolicited refresh traffic. If the origin is fragile, unsolicited background reads for idle keys may hurt it.
Failure modes to monitor¶
- Refresh failure silently stale. If the background refresh throws, the cache entry can hit full TTL and then take the synchronous miss anyway. Emit a refresh-failure counter.
- Loader latency exceeding stale window. If the origin is slower than your stale-window budget, entries regularly expire before refresh completes. Alarm on refresh- completion-time relative to stale window.
- Stale window too tight. The stale window must cover at least one expected hit-period per key; too tight and most keys still expire synchronously.
Seen in¶
- sources/2025-03-06-zalando-from-event-driven-chaos-to-a-blazingly-fast-serving-api — Zalando PRAPI's 60s / 15s configuration on DynamoDB-backed product data. Load-bearing for their sub-10ms P99 on single GETs.