CONCEPT

Async-loading cache stale window¶

Definition¶

An async-loading cache stale window is a trailing segment of an entry's TTL during which reads are served from the cached value immediately while a background refresh against the origin is triggered asynchronously. The cache behaves as fresh for as long as the value is within TTL, but any read inside the stale window is also a signal to refresh — so by the time the TTL would expire, a newer value has (usually) already replaced the existing entry.

This is the application-cache-layer realisation of the general stale-while- revalidate semantic (RFC 5861 framed it for HTTP caches).

Canonical configuration¶

Caffeine exposes this directly:

Caffeine.newBuilder()
  .expireAfterWrite(60, TimeUnit.SECONDS)
  .refreshAfterWrite(45, TimeUnit.SECONDS) // 15s stale window
  .buildAsync(loader);

Zalando's PRAPI runs exactly this shape — "a 60 second cache time with the final 15 seconds as the stale window. In the last 15 seconds, retrieving a cache entry triggers a background refresh from DynamoDB." (Source: .)

Why it exists¶

Three problems with a naive cache (hit or miss, TTL expiry = hard miss):

Tail-latency spike on expiry. The first request after TTL expiry pays the full origin-fetch latency — a guaranteed P-something tail spike.
Stampede risk. N concurrent reads on an expired entry all miss, all invoke origin. Async-loading caches use CompletableFuture dedup, but naive TTL caches don't.
No "warm-ish" state. The cache is either absolutely fresh or absolutely gone — there's no way to say "fresh enough to serve, stale enough to refresh proactively."

The stale window solves all three: reads stay fast, the refresh happens out-of-band, and the origin sees one fetch per entry per window rather than N concurrent misses.

Interaction with hit rate and consistency¶

Hit rate — stays high as long as the window catches most reads before expiry. Tuning: make the window large enough that near-expiry traffic reliably triggers refresh before TTL elapses.
Consistency — the cache is bounded-stale by at most the TTL; within the stale window, served values are up to (TTL - window) seconds out of date. If the origin data is mutating more often than window, the stale window can emit multiple generations of stale values before the refresh completes.
Thundering herd — async-loading implementations (e.g. Caffeine's AsyncLoadingCache) coalesce concurrent misses into a single load future, so stampedes are already suppressed at the library level; the stale window extends that guarantee to near-expiry reads, not just miss reads.

Failure modes¶

Background refresh silently failing — the cached value can hit full TTL and then take the full origin fetch anyway, re-introducing the tail-latency spike. Good implementations surface refresh-failure metrics.
Window too small — low-traffic keys never get read during the stale window, so every expiry still takes a synchronous miss. Solution: shift to a refresh-ahead scheduled policy or accept the miss cost.
Window too large — inflates the bounded-stale window, so downstream consumers may see stale data longer than expected.

Seen in¶

— PRAPI's 60s / 15s configuration. Load-bearing for the sub-10ms P99 — without the stale window, near-expiry reads on hot products would regularly pay DynamoDB round-trip latency (~single-digit-ms but with occasional spikes).

systems/caffeine — canonical Java implementation
concepts/stale-while-revalidate-cache — the general semantic
concepts/cache-ttl-staleness-dilemma — the design tension
concepts/thundering-herd — what async-loading caches also prevent
patterns/async-refresh-cache-loader
systems/zalando-prapi — production consumer