CONCEPT Cited by 1 source
Granularity-aligned bucket¶
Definition¶
A granularity-aligned bucket is a fixed-size time bucket whose boundaries are aligned to a query's declared granularity (1 minute, 5 minutes, 1 hour). Cached time-series data is decomposed into these buckets and keyed on the bucket coordinate — so queries with overlapping time intervals reuse the buckets they share, rather than treating each interval as an opaque new cache key.
Why it matters: shift invariance¶
A single-level cache key (query, interval) — e.g. Druid's own
full-result cache — changes whenever the interval shifts, so a
rolling-window query that shifts
30 seconds misses even though 99% of the underlying data is
identical.
A granularity-aligned bucket cache key (query_hash,
bucket_start_timestamp) is shift-invariant on the bucketed
dimension: if the old interval covered buckets
[b_0, b_1, ..., b_N] and the new interval covers
[b_1, b_2, ..., b_N+1], then N of N+1 buckets are already
cached; only b_N+1 needs a backend fetch.
Netflix's implementation¶
Netflix's Druid cache uses (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid):
- Bucket size = query granularity, with a 1-minute minimum (sub-minute granularities collapse to 1-minute buckets to keep the bucket count sane for typical dashboard windows).
- Inner-key encoding = big-endian bytes of the bucket-start timestamp. Big-endian means lexicographic order matches chronological order, so the KV store's native range-scan primitive (from two-level-map stores like KVDAL) directly answers "give me all buckets between A and B for this query hash".
- Outer key = SHA-256 of the query with time interval + volatile context removed — query-structure-aware hashing.
- A 3-hour 1-minute-granularity query decomposes into 180 independent cached buckets, each with its own TTL.
Why the bucket size matters¶
Bucket size is the cache's atomic reusable unit. Smaller buckets (1 s) = more reuse on small shifts but more keys + more cache entries + more TTL slots. Larger buckets (1 min, 5 min) = fewer keys + less granular reuse. Aligning to the query's declared granularity is the natural choice — the query has already committed to aggregation at that grain, so a per-bucket cache entry is the semantic unit the consumer asked for.
Netflix's 1-minute floor is a pragmatic bound: dashboards almost never ask for sub-minute granularity for multi-hour windows, and 1-minute keeps 180 entries for a 3-hour query rather than 10,800.
Relationship to other bucket models¶
- Druid segments — Druid itself internally stores data in time-partitioned immutable segments. Conceptually these are storage-layer buckets. The cache's granularity-aligned buckets are cache-layer buckets on top — different shape, same time-series-bucketing idea.
- Netflix Distributed Counter rollup buckets — time-bucketed counter rollups with fixed bucket sizes are the storage-layer bucket analog in another Netflix system. Same architectural move at different layers.
- Prometheus / TSDB retention buckets — age-based retention tiers in TSDBs are conceptually related (different rules for different ages), though they're typically coarser (hours / days).
Seen in¶
- sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid — Netflix's Druid cache is the canonical wiki instance at the cache-layer.
Related¶
- concepts/time-series-bucketing — the general framing.
- concepts/rolling-window-query — the workload that makes granularity-aligned buckets load-bearing.
- concepts/exponential-ttl — the TTL model that's applied per bucket.
- concepts/two-level-map-kv-model — the storage model that makes outer-key + inner-key-range retrieval efficient.
- systems/netflix-druid-interval-cache
- systems/netflix-kv-dal
- patterns/interval-aware-query-cache
- patterns/partial-cache-hit-with-tail-fetch