Skip to content

CONCEPT Cited by 1 source

Granularity-aligned bucket

Definition

A granularity-aligned bucket is a fixed-size time bucket whose boundaries are aligned to a query's declared granularity (1 minute, 5 minutes, 1 hour). Cached time-series data is decomposed into these buckets and keyed on the bucket coordinate — so queries with overlapping time intervals reuse the buckets they share, rather than treating each interval as an opaque new cache key.

Why it matters: shift invariance

A single-level cache key (query, interval) — e.g. Druid's own full-result cache — changes whenever the interval shifts, so a rolling-window query that shifts 30 seconds misses even though 99% of the underlying data is identical.

A granularity-aligned bucket cache key (query_hash, bucket_start_timestamp) is shift-invariant on the bucketed dimension: if the old interval covered buckets [b_0, b_1, ..., b_N] and the new interval covers [b_1, b_2, ..., b_N+1], then N of N+1 buckets are already cached; only b_N+1 needs a backend fetch.

Netflix's implementation

Netflix's Druid cache uses (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid):

  • Bucket size = query granularity, with a 1-minute minimum (sub-minute granularities collapse to 1-minute buckets to keep the bucket count sane for typical dashboard windows).
  • Inner-key encoding = big-endian bytes of the bucket-start timestamp. Big-endian means lexicographic order matches chronological order, so the KV store's native range-scan primitive (from two-level-map stores like KVDAL) directly answers "give me all buckets between A and B for this query hash".
  • Outer key = SHA-256 of the query with time interval + volatile context removed — query-structure-aware hashing.
  • A 3-hour 1-minute-granularity query decomposes into 180 independent cached buckets, each with its own TTL.

Why the bucket size matters

Bucket size is the cache's atomic reusable unit. Smaller buckets (1 s) = more reuse on small shifts but more keys + more cache entries + more TTL slots. Larger buckets (1 min, 5 min) = fewer keys + less granular reuse. Aligning to the query's declared granularity is the natural choice — the query has already committed to aggregation at that grain, so a per-bucket cache entry is the semantic unit the consumer asked for.

Netflix's 1-minute floor is a pragmatic bound: dashboards almost never ask for sub-minute granularity for multi-hour windows, and 1-minute keeps 180 entries for a 3-hour query rather than 10,800.

Relationship to other bucket models

  • Druid segments — Druid itself internally stores data in time-partitioned immutable segments. Conceptually these are storage-layer buckets. The cache's granularity-aligned buckets are cache-layer buckets on top — different shape, same time-series-bucketing idea.
  • Netflix Distributed Counter rollup buckets — time-bucketed counter rollups with fixed bucket sizes are the storage-layer bucket analog in another Netflix system. Same architectural move at different layers.
  • Prometheus / TSDB retention buckets — age-based retention tiers in TSDBs are conceptually related (different rules for different ages), though they're typically coarser (hours / days).

Seen in

Last updated · 319 distilled / 1,201 read