NETFLIX 2026-04-06 Tier 1

Netflix — Stop Answering the Same Question Twice: Interval-Aware Caching for Druid at Netflix Scale¶

Summary¶

Netflix TechBlog post (2026-04-06, Tier 1; Ben Sykes) on an experimental caching layer Netflix built in front of Apache Druid to handle the rolling-window dashboard query pattern — the case where dozens of engineers stare at the same live-show / canary / A-B-test dashboard during a high-profile launch and Druid drowns in near-duplicate queries whose time window shifts by a few seconds between refreshes. Druid's own full-result cache misses any time the window shifts and refuses to cache results involving realtime segments; its per-segment cache helps historical nodes but brokers still merge per-segment results with realtime data every refresh. The new layer — Netflix's Druid interval-aware cache — decomposes every time-series query into granularity-aligned per-bucket cache entries with exponentially growing age-based TTLs, intercepts requests at the Druid Router, serves whatever portion of the interval is already cached, rebuilds a narrowed query for the missing tail, and merges.

Netflix operates >10 trillion rows in Druid, ingesting up to 15M events/sec; one popular dashboard issues 64 queries per load, refreshes every 10 seconds, and at 30 concurrent viewers generates ~192 queries/second mostly for identical data (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Headline production results: 82% of real user queries get at least a partial cache hit, 84% of result data served from cache, cache P90 ~5.5 ms, experimental A/B showed ~33% drop in queries to Druid and ~66% improvement in overall P90 query times, with 14× result-bytes reduction on some shapes. Storage layer is Netflix's KVDAL (backed by Cassandra) — the two-level-map model with per-inner-key TTLs is a natural fit, and KVDAL handles eviction automatically.

Key takeaways¶

The workload shape is "rolling-window dashboards at hyperscale under the same event," not ad-hoc queries. Druid's caches are well-suited to stable time ranges + settled historical data; dashboards request [now-3h, now] and the right boundary advances every refresh. Full-result cache misses every time the window shifts. Druid also deliberately refuses to cache results involving realtime segments (still-being-indexed data), because it values cache-result determinism + query correctness over a higher hit rate. Per-segment cache skips redundant historical scans but brokers still merge realtime-node data on every refresh. At Netflix scale the near-duplicate flood becomes the dominant cost (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
The core insight: when a dashboard asks for "last 3 hours," the data from 2 hours ago isn't going to change. Only the last few minutes are genuinely fresh. The cache should remember the old portion and ask Druid only for the new tail — this is the patterns/partial-cache-hit-with-tail-fetch shape (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Deliberate staleness trade-off: up to 5 seconds on the newest data. Netflix's end-to-end pipeline latency is already under 5 seconds at P90; most operational dashboards refresh at 10–30 s and often set end = now-1m or now-5s to avoid "the flappy tail" of currently-arriving data. So a 5 s cache TTL on the fresh bucket adds negligibly to the staleness already present in the pipeline — in exchange for much lower Druid load. This is the canonical concepts/staleness-vs-load-tradeoff framing: the operator accepts a bounded staleness window as the price of bounded Druid load (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Exponential-TTL ladder keyed on data age is the late-arriving-data mitigation. Not all buckets are equally trustworthy. A data point from 30 seconds ago might still change as late-arriving events trickle in; a point from 30 minutes ago is almost certainly final. Published ladder: min 5 s TTL for data <2 min old; 10 s at 2 min, 20 s at 3 min, 40 s at 4 min, doubling per additional minute of age, capped at 1 hour. Fresh buckets cycle rapidly (late-arriving corrections picked up quickly); old buckets linger (confidence grows with time) (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Granularity-aligned time bucketing with a map-of-maps cache key is what makes shifted windows cheap. A single-level key (query, interval) — like Druid's own full-result cache — would miss on every shift. Netflix uses a map-of-maps: the outer key is the query hash with the time interval and volatile context removed, the inner keys are timestamps bucketed to the query granularity (or 1 minute, whichever is larger), encoded as big-endian bytes so lexicographic order matches chronological order — which enables efficient range scans. A 3-hour 1-minute-granularity query becomes 180 independent cached buckets, each with its own TTL. When the window shifts 30 s later, most buckets are reused and only the new tail hits Druid (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Intercepting proxy at the Druid Router keeps the cache a zero-change drop-in for clients. The cache runs as an external service. Requests are intercepted at the Druid Router and redirected to the cache. If the cache fully satisfies a query, it returns immediately. Otherwise it shrinks the interval to the uncached portion and calls back into the Router, bypassing the redirect, to query Druid normally. Non-cacheable queries (metadata queries, queries without time group-bys) pass straight through untouched. No dashboard-code changes were needed. Netflix explicitly frames the proxy shape as a temporary posture while they work on upstreaming the capability into Druid natively (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Cacheable query shape: time group-by (timeseries, groupBy). For these, the cache parses the query, extracts interval + granularity + structure, computes SHA-256 of the query with time interval and volatile context removed — that hash encodes what is being asked (datasource, filters, aggregations, granularity) but not when. Certain Druid context properties that alter response structure/content are included in the key (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Gap handling: contiguous-prefix rule with single-tail fetch, not gap-filling. On a cache lookup, Netflix fetches cached points in the requested range but only if they're contiguous from the start. Bucket TTLs expire unevenly, so gaps appear; on the first gap, stop and fetch all newer data from Druid in one narrowed query. This guarantees a complete, unbroken result set with at most one Druid query per request — rather than filling holes with multiple small fragmented queries that would increase Druid load. On a partial hit (e.g. 2h 50m of a 3h window cached), Netflix rebuilds the query with a narrowed 10-minute interval. Druid scans only recent segments → usually faster and cheaper than the original (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Negative caching for naturally sparse metrics — with a trailing-buckets exception. Some metrics legitimately have no data in some buckets; without special handling the gap-detection logic would re-query Druid for them forever. Netflix caches empty sentinel values for no-data buckets, and the gap detector treats them as valid cached data. Trailing empty buckets (after the last data point) are deliberately not negatively cached — an empty tail might just be data that hasn't arrived yet, and caching "no data" for it would exacerbate late-arrival chart delays (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Storage layer = Netflix KVDAL on Cassandra. KVDAL's two-level-map abstraction is a natural fit — outer key = query hash, inner keys = bucketed timestamps. Crucially KVDAL supports independent TTLs on each inner key-value pair, which is exactly what the age-based TTL ladder needs and eliminates manual eviction. The two-level structure also gives efficient range queries over inner keys — "give me all cached buckets between A and B for query hash X" — which is exactly the partial-hit lookup shape. This is the first wiki-documented KVDAL consumer use case beyond the KVDAL launch post itself (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Asynchronous write-back on cache-miss path. Fresh data returned from Druid is parsed into per-granularity buckets and written back to the cache asynchronously, so the cache write doesn't add latency to the response path. From the client's perspective, the merged (cached-prefix + Druid-tail) response is byte-identical in JSON shape to what a direct Druid query would have returned (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Production results (typical day). 82% of real user queries get at least a partial cache hit; 84% of result data is served from cache; P90 cache response time ~5.5 ms. The queries that reach Druid scan much narrower time ranges → fewer segments touched → less data processed. Druid's scaling bottleneck moves from query capacity (expensive to scale) to cache capacity (much cheaper to scale) (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Experiment results vs Druid-only baseline. A/B experiment measured ~33% drop in queries to Druid, ~66% improvement in overall P90 query times, reduced result bytes + segments queried; on some shapes enabling the cache reduced result bytes by >14×. Caveat from the post: gains depend heavily on how similar + repetitive the query workload is — the rolling-window-dashboard pattern is the sweet spot, not ad-hoc analytics (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
Forward direction: upstream into Druid natively. Netflix explicitly frames the external-proxy architecture as a workaround: the extra network hops, SQL-templating workarounds, and operational surface area are a tax relative to building the capability into Druid's Brokers. Their proposed shape is an opt-in, configurable, result-level cache in the Brokers with metrics to tune TTLs and measure effectiveness — broader applicability beyond Netflix. Partial SQL-templating support is already landed so dashboard tools can benefit without writing native Druid queries (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
The generalisation: the shape isn't Druid-specific. Netflix flags that "splitting time-series results into independently cached, granularity-aligned buckets with age-based exponential TTLs" could apply to any time-series database with frequent overlapping-window queries — Prometheus, InfluxDB, M3, VictoriaMetrics, Pinot, TimescaleDB. This is why the wiki files the bucket + exponential-TTL combination as a named pattern rather than a Druid-specific hack (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Systems introduced / extended¶

New: systems/apache-druid — Apache Druid, the real-time OLAP time-series database powering Netflix's dashboards / alerts / canary analysis / A-B monitoring. First wiki mention.
New: systems/netflix-druid-interval-cache — Netflix's interval-aware external caching service in front of Druid. Intercepts at the Druid Router; uses bucketed time-series cache entries with age-based exponential TTLs on KVDAL / Cassandra.
Extended: systems/netflix-kv-dal — first documented consumer use case on the wiki beyond the 2024-09-19 launch post. Uses the independent-per-inner-key TTL feature and the range-scan inner-key retrieval feature.
Extended: systems/apache-cassandra — tagged as the substrate under KVDAL for this caching use case.

Concepts introduced / extended¶

New: concepts/rolling-window-query — the query pattern where the right boundary advances every refresh ([now-Δ, now]). Canonical instance: dashboard refresh loops.
New: concepts/exponential-ttl — age-based TTL that doubles (or otherwise increases monotonically) with data age; Netflix's Druid cache is the canonical wiki instance.
New: concepts/granularity-aligned-bucket — caching time-series data by decomposing the interval into fixed-size granularity-aligned buckets (1 min, query-granularity, etc.) so shifted windows reuse most buckets.
New: concepts/negative-caching — caching empty / sentinel responses for absent data to prevent re-querying the backend; Netflix's trailing-bucket exception is the "don't cache the unresolved-yet tail" refinement.
New: concepts/late-arriving-data — events that arrive out of order / delayed in the ingestion pipeline; the forcing function behind exponential TTLs for the freshest data and the reason trailing empty buckets aren't negatively cached.
New: concepts/query-structure-aware-caching — a cache that parses the query, understands its structure, and decomposes the response by structural axis (here: time interval). Contrasts with opaque key-value caches.
New: concepts/time-series-bucketing — the general practice of carving continuous time into discrete aligned buckets; Netflix uses this as the cache-key strategy, but it's also the indexing strategy in Druid itself, in TSDBs generally, and in the Distributed Counter sliding-window rollup.
New: concepts/staleness-vs-load-tradeoff — the explicit architectural choice to accept a bounded staleness window in exchange for bounded load on the backing system. Netflix's 5-second TTL + pipeline-latency comparison is the canonical framing on the wiki.

Patterns introduced / extended¶

New: patterns/interval-aware-query-cache — the headline pattern. Decompose a time-windowed query by granularity-aligned buckets; cache each bucket independently with TTL scaled to bucket age; assemble cached prefix + one narrowed backend fetch. Named explicitly by Netflix as the generalisable shape.
New: patterns/age-based-exponential-ttl — sub-pattern; TTLs scale (exponentially / piecewise) with data age. Fresh data cycles fast; old data lingers. Used inside interval-aware-query-cache but applicable in other time-series caching contexts.
New: patterns/intercepting-proxy-for-transparent-cache — deploy the cache as an external service that intercepts requests at the existing router/LB/broker layer and falls through to the original path for non-cacheable requests. Zero client changes.
New: patterns/partial-cache-hit-with-tail-fetch — on partial cache hit, return the cached prefix and issue a single narrowed query for the missing tail, rather than filling gaps with multiple fragmented queries (which would amplify backend load).

Operational numbers disclosed¶

Metric	Value	Source / context
Total rows in Druid	>10 trillion	Scale framing
Peak ingestion rate	up to 15M events/sec	Scale framing
Popular dashboard	26 charts, 64 queries per load	Example
Dashboard refresh	every 10 s	Example
Concurrent viewers	30	Example
Peak query rate (1 dashboard)	~192 queries/sec	64 × 30 / 10
Cache hit rate	82% of queries get ≥partial hit	Production, typical day
Data served from cache	84% of result data	Production, typical day
Cache P90 response time	~5.5 ms	Production
Query drop to Druid	~33%	A/B experiment
P90 query time improvement	~66%	A/B experiment
Result-bytes reduction	up to 14×	A/B experiment, some shapes
Min TTL (fresh data)	5 s	Ladder base
Max TTL (old data)	1 hour	Ladder cap
TTL doubling increment	per additional minute of age	Ladder rule
"Young" data threshold	<2 min old → min TTL	Ladder rule
Pipeline P90 latency	<5 s end-to-end	Justifies 5s min TTL
Inner-key encoding	big-endian bytes (lex order = chrono order)	Range-scan enabler
Bucket granularity	query granularity, min 1 min	Cache-key rule
Example 3h query	180 buckets @ 1-min granularity	Geometry

Caveats¶

Post is an architecture write-up, not a case study with numbers over time. All percentages are reported as typical-day snapshots; no time-series of cache-hit rate vs traffic events, no hardware footprint, no cost / $/query comparison, no per-region or per-datasource breakdown.
"Experimental" is the declared status. Netflix flags the cache as still experimental; the long-term posture is upstreaming into Druid proper. Production usage is meaningful but narrower than a GA-claimed system.
Gains are workload-shape-dependent. Netflix explicitly cautions that the measured gains depend on how repetitive + similar the query workload is. Ad-hoc analytics workloads would see a very different result. The sweet spot is the rolling-window-dashboard pattern at high viewer concurrency.
Cache stampede / hot-key behaviour not discussed. The post doesn't describe what happens when a bucket TTL expires for a very hot query with many concurrent viewers — whether reads coalesce, whether there's request dedup at the cache layer, or whether many parallel misses stampede the narrowed Druid fetch.
SQL-templating support described as "partial." The cache is native-Druid-query-shaped; dashboards using SQL get partial support via templating, but the completeness of that coverage and its failure modes aren't detailed.
Eviction / capacity planning not discussed. KVDAL / Cassandra capacity sizing, hot-bucket skew, how many query hashes the cache holds at steady state — not disclosed.
Consistency during Druid schema / segment changes not discussed. What happens when a Druid datasource's schema changes mid-TTL — whether old cached buckets are invalidated or served stale — isn't described.
No discussion of monitoring / on-call posture for the cache itself. Given it's on the hot path for dashboards that monitor live shows, the cache is meta-observability infrastructure; how Netflix monitors the monitor isn't covered.
Non-time group-by queries pass through unchanged. This is called out as a design decision — metadata queries, queries without time group-by — but the fraction of production traffic these represent and the resulting ceiling on cache effectiveness aren't quantified.

Cross-source continuity¶

Joins KVDAL as first documented downstream consumer on the wiki beyond the 2024-09-19 KVDAL launch post. The 2024 post described KVDAL's capabilities (two-level-map, idempotency tokens, chunking, independent inner TTLs, range scans) in the abstract. This post shows a concrete KVDAL customer that exercises two of those capabilities (per-inner-key TTLs + range scans) as load-bearing architecture.
Extends Netflix's observability / analytics-platform axis. Prior Netflix corpus covers ingestion, workflow orchestration, ML platforms, content engineering, cloud efficiency, and container platforms; Apache Druid as Netflix's analytics substrate is introduced for the first time here.
Thematic kinship with Netflix's Distributed Counter Abstraction (2024-11-13) — both are time-series data systems built on KVDAL using bucketed time as the storage shape, both deliberately trade strict accuracy for scale (Counter is eventually consistent with a sliding-window rollup; Druid cache is eventually consistent with age-based TTL). Different layers of the stack but same architectural style.
Contrasts with invalidation-based caches (Figma LiveGraph 2026-04-21). LiveGraph pushes invalidations when source data changes; Netflix's Druid cache uses age-based TTL expiry because Druid's data model — append-mostly, late-arriving, immutable-after-settled — makes per-value invalidation impractical and makes age-based confidence a natural proxy for freshness.
Complements cache TTL staleness dilemma (2026-04-08 AWS multi-tenant config) — Netflix's exponential-TTL ladder is one concrete escape from the forced either/or: fresh data gets short TTL, settled data gets long TTL, so the trade-off is no longer uniform across the cached data set.
Parallels at a different layer: caching proxy tier — the intercepting-proxy-at-Router architecture is the same shape as CDN-level caching proxies, Hyperdrive in front of Postgres, etc. Netflix's twist is that the cacheable unit is a granularity-aligned time bucket, not a full response.