Skip to content

SYSTEM Cited by 1 source

Apache Druid

Apache Druid is an open-source real-time OLAP / time-series database designed for high-ingest, low-latency slice-and-dice analytics over event data (dashboards, monitoring, A/B analysis, ad-hoc exploration). Druid is Netflix's substrate for live-show monitoring, dashboards, automated alerting, canary analysis, and A/B test monitoring (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Scale (Netflix production)

Architecture primitives referenced in the Netflix post

  • Segments — Druid's immutable, time-partitioned data chunks. Historical segments (already-indexed settled data) live on historical nodes; realtime segments are still being indexed on realtime nodes.
  • Router — query-routing tier in front of Brokers. Netflix's interval-aware cache intercepts requests at the Router via an intercepting proxy, redirecting cacheable queries to the cache and falling back to the Router for the uncached portion.
  • Broker — merges per-segment results across historical + realtime nodes to assemble a query response.
  • Built-in caches:
    • Full-result cache — keyed on the whole query including the time interval. Misses whenever the time window shifts, even by one second. Druid also deliberately refuses to cache results that involve realtime segments, prioritising deterministic/stable cache results and query correctness over higher hit rate.
    • Per-segment cache — avoids redundant scans on historical nodes but the Broker still collects segment results from each data node and merges them with data from realtime nodes on every query. Neither is designed to handle the continuous, overlapping time-window shifts inherent to rolling-window dashboards (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Query shapes relevant to caching

The Netflix cache handles time group-by queriestimeseries and groupBy over a time dimension — because those are the shapes where the interval decomposes naturally into granularity-aligned buckets. Non-time-group-by queries (metadata queries, general select shapes) pass through unchanged (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Late-arriving data

Druid in real-time analytics faces the well-known late-arriving data problem — events can arrive out of order or be delayed in the ingestion pipeline, so the value of a recent time bucket is unstable until enough settling time has passed. This is why Netflix's cache uses exponential-TTL keyed on data age rather than a uniform TTL.

Netflix's cache posture and upstream direction

Netflix built the interval- aware cache as an external service at the Druid Router layer as an experimental capability. Their stated long-term direction is upstreaming the capability into Druid proper — likely as an opt-in, configurable, result-level cache in the Brokers with metrics to tune TTLs and measure effectiveness — eliminating the extra infra + network hops + SQL-templating workarounds of the external-proxy shape. "We'd likely ship it as an opt-in, configurable, result-level cache in the Brokers, with metrics to tune TTLs and measure effectiveness" (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Seen in

Last updated · 319 distilled / 1,201 read