SYSTEM Cited by 1 source

Apache Druid¶

Apache Druid is an open-source real-time OLAP / time-series database designed for high-ingest, low-latency slice-and-dice analytics over event data (dashboards, monitoring, A/B analysis, ad-hoc exploration). Druid is Netflix's substrate for live-show monitoring, dashboards, automated alerting, canary analysis, and A/B test monitoring (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Scale (Netflix production)¶

>10 trillion rows in the database
Up to 15M events/sec ingested
Trillions of rows queried regularly (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid)

Architecture primitives referenced in the Netflix post¶

Segments — Druid's immutable, time-partitioned data chunks. Historical segments (already-indexed settled data) live on historical nodes; realtime segments are still being indexed on realtime nodes.
Router — query-routing tier in front of Brokers. Netflix's interval-aware cache intercepts requests at the Router via an intercepting proxy, redirecting cacheable queries to the cache and falling back to the Router for the uncached portion.
Broker — merges per-segment results across historical + realtime nodes to assemble a query response.
Built-in caches:
- Full-result cache — keyed on the whole query including the time interval. Misses whenever the time window shifts, even by one second. Druid also deliberately refuses to cache results that involve realtime segments, prioritising deterministic/stable cache results and query correctness over higher hit rate.
- Per-segment cache — avoids redundant scans on historical nodes but the Broker still collects segment results from each data node and merges them with data from realtime nodes on every query. Neither is designed to handle the continuous, overlapping time-window shifts inherent to rolling-window dashboards (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Query shapes relevant to caching¶

The Netflix cache handles time group-by queries — timeseries and groupBy over a time dimension — because those are the shapes where the interval decomposes naturally into granularity-aligned buckets. Non-time-group-by queries (metadata queries, general select shapes) pass through unchanged (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Late-arriving data¶

Druid in real-time analytics faces the well-known late-arriving data problem — events can arrive out of order or be delayed in the ingestion pipeline, so the value of a recent time bucket is unstable until enough settling time has passed. This is why Netflix's cache uses exponential-TTL keyed on data age rather than a uniform TTL.

Netflix's cache posture and upstream direction¶

Netflix built the interval- aware cache as an external service at the Druid Router layer as an experimental capability. Their stated long-term direction is upstreaming the capability into Druid proper — likely as an opt-in, configurable, result-level cache in the Brokers with metrics to tune TTLs and measure effectiveness — eliminating the extra infra + network hops + SQL-templating workarounds of the external-proxy shape. "We'd likely ship it as an opt-in, configurable, result-level cache in the Brokers, with metrics to tune TTLs and measure effectiveness" (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Seen in¶

sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid — Netflix's rolling-window-dashboard caching layer introduces Druid to the wiki.

systems/netflix-druid-interval-cache — Netflix's external interval-aware cache in front of Druid.
concepts/time-series-bucketing — Druid's internal segment model; Netflix's cache-key strategy.
concepts/rolling-window-query — the workload shape that motivates the Netflix cache.
concepts/late-arriving-data — the real-time-analytics problem Druid exposes and the cache's exponential TTL mitigates.
companies/netflix