CONCEPT Cited by 2 sources
Time-series bucketing¶
Definition¶
Time-series bucketing is the practice of decomposing a continuous stream of time-stamped events or values into discrete, aligned-to-fixed-boundary time windows (buckets) — minute-aligned, hour-aligned, day-aligned — and treating each bucket as a unit for storage, retrieval, caching, rollup, or retention.
It's one of the most pervasive primitives in time-series systems; different layers of the stack bucket for different reasons but using the same shape.
Where time-series bucketing shows up¶
(1) Storage-layer bucketing¶
- Apache Druid segments — Druid stores data in time-partitioned immutable segments. Segments are the atomic unit of storage, replication, retention, and query-planning (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
- MongoDB bucket pattern — grouping multiple time-ordered documents into one parent bucket doc for IoT / metrics workloads.
- Partition-key = bucketed-timestamp — a common Cassandra / DynamoDB pattern to avoid wide partitions on high-write time-series.
(2) Cache-layer bucketing¶
- Netflix Druid interval cache — per-granularity-aligned-bucket cache entries with independent age-based TTLs, enabling rolling-window query reuse (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).
(3) Aggregation / rollup-layer bucketing¶
- Netflix Distributed Counter — time-bucketed rollup of counter events into immutable aggregation windows (Source: sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction).
- Prometheus 2-hour blocks — Prometheus TSDB writes immutable 2-hour blocks at a time.
(4) Retention / TTL-layer bucketing¶
- Hot / warm / cold time-bucketed tiers — different retention policies per time bucket age (e.g. 7-day full-resolution, 90-day 1-hour-resolution, 1-year daily-resolution).
Why bucketing is so pervasive¶
- Alignment enables reuse. Two overlapping queries that both include "minute 42" read the same bucket. Alignment is what makes caches and rollups share work across queries.
- Bucketing bounds the unit of change. An immutable bucket can be replicated / tiered / archived / compressed as one unit.
- Bucketing exposes the late-arrival problem. Once a bucket is declared closed, late arrivals must either update it (if the layer allows) or be discarded. This is where late-arriving-data policy lives.
- Bucketing matches dashboards. Dashboard charts almost always aggregate by time, with a fixed bar width — bucketing at the storage or cache layer aligns natural-storage-granularity to natural-display-granularity.
Key design choices¶
| Choice | Implication |
|---|---|
| Bucket size | Smaller → more reuse on small shifts + more entries; larger → coarser reuse + fewer entries |
| Alignment | Always align to a wall-clock boundary (minute 00, hour 00) rather than "first event + N seconds" — alignment is what makes reuse work across queries |
| Timezone | UTC is the safe default; local timezones create bucket-boundary bugs around DST |
| Min granularity | Sub-second buckets explode in count; most dashboards never need them |
| Max granularity | Day/hour buckets reduce reuse for zoomed-in views |
Seen in¶
- sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid — cache-layer + Druid-storage-layer.
- sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction — rollup-layer with immutable aggregation windows.
Related¶
- concepts/granularity-aligned-bucket — the specific cache-layer application.
- concepts/rolling-window-query — the query shape that benefits most.
- concepts/late-arriving-data — the forcing function for bucket- closing policy.
- concepts/bucket-pattern — MongoDB's storage-layer instance.
- concepts/immutable-aggregation-window — the rollup-layer sibling at Netflix's Distributed Counter.
- systems/apache-druid
- systems/netflix-druid-interval-cache
- systems/netflix-distributed-counter
- patterns/bucketed-event-time-partitioning