CONCEPT Cited by 2 sources

Time-series bucketing¶

Definition¶

Time-series bucketing is the practice of decomposing a continuous stream of time-stamped events or values into discrete, aligned-to-fixed-boundary time windows (buckets) — minute-aligned, hour-aligned, day-aligned — and treating each bucket as a unit for storage, retrieval, caching, rollup, or retention.

It's one of the most pervasive primitives in time-series systems; different layers of the stack bucket for different reasons but using the same shape.

Where time-series bucketing shows up¶

(1) Storage-layer bucketing¶

Apache Druid segments — Druid stores data in time-partitioned immutable segments. Segments are the atomic unit of storage, replication, retention, and query-planning (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
MongoDB bucket pattern — grouping multiple time-ordered documents into one parent bucket doc for IoT / metrics workloads.
Partition-key = bucketed-timestamp — a common Cassandra / DynamoDB pattern to avoid wide partitions on high-write time-series.

(2) Cache-layer bucketing¶

Netflix Druid interval cache — per-granularity-aligned-bucket cache entries with independent age-based TTLs, enabling rolling-window query reuse (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

(3) Aggregation / rollup-layer bucketing¶

Netflix Distributed Counter — time-bucketed rollup of counter events into immutable aggregation windows (Source: sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction).
Prometheus 2-hour blocks — Prometheus TSDB writes immutable 2-hour blocks at a time.

(4) Retention / TTL-layer bucketing¶

Hot / warm / cold time-bucketed tiers — different retention policies per time bucket age (e.g. 7-day full-resolution, 90-day 1-hour-resolution, 1-year daily-resolution).

Why bucketing is so pervasive¶

Alignment enables reuse. Two overlapping queries that both include "minute 42" read the same bucket. Alignment is what makes caches and rollups share work across queries.
Bucketing bounds the unit of change. An immutable bucket can be replicated / tiered / archived / compressed as one unit.
Bucketing exposes the late-arrival problem. Once a bucket is declared closed, late arrivals must either update it (if the layer allows) or be discarded. This is where late-arriving-data policy lives.
Bucketing matches dashboards. Dashboard charts almost always aggregate by time, with a fixed bar width — bucketing at the storage or cache layer aligns natural-storage-granularity to natural-display-granularity.

Key design choices¶

Choice	Implication
Bucket size	Smaller → more reuse on small shifts + more entries; larger → coarser reuse + fewer entries
Alignment	Always align to a wall-clock boundary (minute 00, hour 00) rather than "first event + N seconds" — alignment is what makes reuse work across queries
Timezone	UTC is the safe default; local timezones create bucket-boundary bugs around DST
Min granularity	Sub-second buckets explode in count; most dashboards never need them
Max granularity	Day/hour buckets reduce reuse for zoomed-in views

Seen in¶

sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid — cache-layer + Druid-storage-layer.
sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction — rollup-layer with immutable aggregation windows.

concepts/granularity-aligned-bucket — the specific cache-layer application.
concepts/rolling-window-query — the query shape that benefits most.
concepts/late-arriving-data — the forcing function for bucket- closing policy.
concepts/bucket-pattern — MongoDB's storage-layer instance.
concepts/immutable-aggregation-window — the rollup-layer sibling at Netflix's Distributed Counter.
systems/apache-druid
systems/netflix-druid-interval-cache
systems/netflix-distributed-counter
patterns/bucketed-event-time-partitioning