PATTERN Cited by 1 source
Partial cache hit with tail fetch¶
Pattern¶
When a cache holds a decomposable time-series response as independently-keyed fragments (e.g. granularity-aligned buckets), a request over an interval may find some fragments cached and some missing. The canonical handling rule:
- Scan cached fragments in contiguous prefix from the interval start.
- On the first gap, stop.
- Issue one narrowed backend query covering from the gap to the interval end — even if there are later cached fragments inside that range.
- Concatenate the cached prefix with the backend tail response.
- Sort by the decomposition axis (timestamp) and return.
The rule is counter-intuitive: why not fill only the gaps?
Why contiguous-prefix + single tail-fetch (not gap-filling)¶
If the cache has scattered buckets — say, buckets 1–50 hit, 51 miss, 52 hit, 53 miss, 54–55 hit — a "gap-filling" strategy would issue two narrowed backend queries (for 51 and 53). A "fetch all missing tail" strategy issues one narrowed query (for 51–55) at the cost of re-fetching the 52 / 54 / 55 buckets that were already cached.
The one-query-max rule is what Netflix uses (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid):
"When we hit a gap, we stop and fetch all newer data from Druid. This guarantees a complete, unbroken result set while sending at most one Druid query, rather than 'filling gaps' with multiple small, fragmented queries that would increase Druid load."
The reason: many small queries are more expensive than one slightly-larger query for most time-series backends. Query setup cost (parsing, planning, segment discovery, broker coordination, network round-trips) dominates the cost of scanning a few extra buckets. Gap-filling optimises the wrong axis.
When the pattern is a good fit¶
- Gaps in the cache are relatively rare and occur near the tail (fresh data) — typical of rolling-window queries with age-based exponential TTLs (old buckets live long, fresh ones churn).
- Backend per-query overhead (setup + network) is non-trivial relative to the per-bucket scan cost.
- The cache layer's write-back of the tail-fetch covers both the gap and the later cached buckets that were re-fetched — so those entries refresh for free.
When it's not a good fit¶
- Gaps can be large and scattered throughout the interval — then a single tail-fetch re-scans lots of already-cached data; narrow gap-filling would be cheaper (at the cost of more backend calls).
- Backend per-query overhead is negligible (e.g. in-memory store with cheap request multiplexing).
- Consistency requirement is that gaps must serve only the previously-cached old values, not fresh-backend values (rare).
Integration with concepts/negative-caching¶
A "gap" for the contiguous-prefix scan is defined as "no cache entry" — explicitly not "cache entry with empty sentinel value." Netflix distinguishes (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid):
- Interior empty-sentinel buckets (negative-cached) are treated as valid cached data → the contiguous-prefix scan continues past them.
- Missing cache entries → trigger the gap → tail-fetch.
- Trailing empty tail is deliberately never negative-cached, so buckets in the tail always trigger the gap + tail-fetch when queried after new fresh data could have arrived.
Without negative caching, naturally sparse metrics would gap on every empty bucket and constantly re-query the backend — the pattern requires negative-caching to work on sparse data.
Asynchronous write-back completes the loop¶
After the tail-fetch returns, the cache asynchronously parses and stores the per-bucket results. Importantly the tail-fetch covers both the gap and the subsequent cached-but-to-be-re-fetched buckets, so the write-back refreshes stale-soon entries for free. This keeps the response path latency untouched by cache writes (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Example¶
A 3-hour rolling window at 1-minute granularity = 180 buckets. Suppose 175 are cached (hit), 5 are missing from the newest tail (the last 5 minutes expired / never cached).
- Contiguous prefix = buckets 1–175 → cache hit.
- First gap = bucket 176.
- Single tail-fetch = narrowed Druid query for buckets 176–180.
- Response = 175 cached buckets + 5 fresh buckets concatenated + sorted.
- Druid cost = 5-minute query instead of 3-hour query.
Netflix reports that on a partial hit of 2h 50m of a 3h window, the tail-fetch is ~10 minutes — "usually faster and cheaper than the original" (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Seen in¶
- sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid — canonical wiki instance at Netflix.
Related¶
- concepts/granularity-aligned-bucket — the fragment unit.
- concepts/rolling-window-query — the workload.
- concepts/negative-caching — the integration with empty-bucket handling.
- systems/netflix-druid-interval-cache
- systems/apache-druid
- patterns/interval-aware-query-cache — the parent pattern.