CONCEPT Cited by 1 source

Staleness-vs-load trade-off¶

Definition¶

The staleness-vs-load trade-off is the explicit architectural choice to accept a bounded staleness window on cached data in exchange for bounded load on the backing system. The operator declares: "I am willing to serve answers that are up to Δ seconds out of date, and in return, my backend handles the load of one refresh per Δ rather than one refresh per request."

This framing is distinct from the cache TTL staleness dilemma — which frames the choice as a forced either/or (uniform TTL: either long-and-stale or short-and-loaded). The staleness-vs-load trade-off is the explicit budgeted version: the operator quantifies how much staleness they can tolerate, quotes the load reduction in return, and ships.

Netflix's explicit quantification¶

Netflix's Druid cache post is unusually explicit about the number (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid):

"Caching query results introduces some staleness, specifically, up to 5 seconds for the newest data. This is acceptable for most of our operational dashboards, which refresh every 10 to 30 seconds. In practice, many of our queries already set an end time of now-1m or now-5s to avoid the 'flappy tail' that can occur with currently-arriving data."

And the key load-vs-staleness comparison:

"Since our end-to-end data pipeline latency is typically under 5 seconds at P90, a 5-second cache TTL on the freshest data introduces negligible additional staleness on top of what's already inherent in the system."

The comparison against pipeline latency is the important move. Netflix isn't claiming zero staleness; it's claiming the cache adds negligibly to staleness that was already in the system due to ingestion lag. The operator's budget for staleness is defined by the weakest-link latency of the upstream pipeline.

The general framing¶

For any cached dataset:

total_staleness(answer) ≤ pipeline_latency(data) + cache_TTL(bucket)

If pipeline_latency is small and cache_TTL is small, both contribute visibly.
If pipeline_latency dominates (e.g. 5 s at P90), a small cache_TTL (also 5 s) is roughly a no-op on end-to-end user- visible staleness.
If cache_TTL dominates (e.g. 1 hour), the cache is the staleness source.

A well-designed system makes cache_TTL ≪ pipeline_latency on fresh data (so caching isn't the staleness bottleneck) and cache_TTL ≫ pipeline_latency on settled data (so caching carries the load).

Exponential TTL is the mechanism that lets both inequalities hold at once — short TTLs on fresh buckets where pipeline latency dominates, long TTLs on old buckets where they don't.

When the trade-off is acceptable¶

Dashboards + monitoring — humans read at seconds-to-minutes cadence; seconds of staleness are invisible.
Automated alerting — alert rules typically run on windows measured in minutes; seconds of cache staleness are within alert- rule grace.
Derived aggregates — P99s, counts, histograms — small staleness per-bucket averages out.

When it isn't¶

Trading / financial order matching — any staleness is a correctness bug.
Authorization decisions on revoked credentials — stale "allow" is a security incident (see concepts/cosmetic-logout).
Feature-flag changes that must propagate immediately — cache TTL staleness dilemma in multi-tenant config.

The Netflix post is careful to establish that the operational- dashboard workload is in the "staleness OK" regime — refreshes at 10-30 s, pipeline lag already >5 s, many queries pre-trim the flappy tail.

Contrast with concepts/cache-ttl-staleness-dilemma ¶

Dilemma (uniform TTL): a single TTL must satisfy all data; either long-and-stale (wrong tenant context) or short-and-loaded (metadata service saturation).
Trade-off (this page): explicit, quantified, per-bucket. Exponential-TTL-per-bucket resolves the uniform-TTL dilemma by allowing different staleness budgets on different slices of the data.

Seen in¶

sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid — canonical explicit quantification on the wiki.

concepts/cache-ttl-staleness-dilemma — the uniform-TTL version of the problem.
concepts/exponential-ttl — the main mechanism for navigating the trade-off.
concepts/late-arriving-data — the pipeline-latency source on the data side.
concepts/rolling-window-query — the workload that most benefits.
systems/netflix-druid-interval-cache
patterns/interval-aware-query-cache