Skip to content

CONCEPT Cited by 2 sources

Time-series bucketing

Definition

Time-series bucketing is the practice of decomposing a continuous stream of time-stamped events or values into discrete, aligned-to-fixed-boundary time windows (buckets) — minute-aligned, hour-aligned, day-aligned — and treating each bucket as a unit for storage, retrieval, caching, rollup, or retention.

It's one of the most pervasive primitives in time-series systems; different layers of the stack bucket for different reasons but using the same shape.

Where time-series bucketing shows up

(1) Storage-layer bucketing

(2) Cache-layer bucketing

(3) Aggregation / rollup-layer bucketing

(4) Retention / TTL-layer bucketing

  • Hot / warm / cold time-bucketed tiers — different retention policies per time bucket age (e.g. 7-day full-resolution, 90-day 1-hour-resolution, 1-year daily-resolution).

Why bucketing is so pervasive

  • Alignment enables reuse. Two overlapping queries that both include "minute 42" read the same bucket. Alignment is what makes caches and rollups share work across queries.
  • Bucketing bounds the unit of change. An immutable bucket can be replicated / tiered / archived / compressed as one unit.
  • Bucketing exposes the late-arrival problem. Once a bucket is declared closed, late arrivals must either update it (if the layer allows) or be discarded. This is where late-arriving-data policy lives.
  • Bucketing matches dashboards. Dashboard charts almost always aggregate by time, with a fixed bar width — bucketing at the storage or cache layer aligns natural-storage-granularity to natural-display-granularity.

Key design choices

Choice Implication
Bucket size Smaller → more reuse on small shifts + more entries; larger → coarser reuse + fewer entries
Alignment Always align to a wall-clock boundary (minute 00, hour 00) rather than "first event + N seconds" — alignment is what makes reuse work across queries
Timezone UTC is the safe default; local timezones create bucket-boundary bugs around DST
Min granularity Sub-second buckets explode in count; most dashboards never need them
Max granularity Day/hour buckets reduce reuse for zoomed-in views

Seen in

Last updated · 319 distilled / 1,201 read