Skip to content

PATTERN Cited by 2 sources

Bucketed event-time partitioning

Problem

Persisting high-throughput time-series events keyed by an identifier (counter, user, device, sensor) in Cassandra (or any partitioned wide-column store) naturally concentrates all of a given identifier's events in one partition. At high event rates that partition becomes a wide partition — slow reads, read-repair pain, compaction pressure, memory footprint blowup on scans.

Pattern

Split each identifier's event partition into explicit buckets via schema columns derived from event time + a hash. Events for the same identifier spread across (identifier, time_bucket, event_bucket) partitions; reads issue parallel range scans across the buckets they need.

Canonical schema shape (Netflix TimeSeries Abstraction):

PRIMARY KEY (
  (identifier, time_bucket, event_bucket),  -- partition key
  event_time,                                -- clustering
  event_id,
  event_item_key
)

time_bucket slots events into a coarse time window (e.g. 10 minutes). event_bucket hash-spreads within a time window to a configurable number of sub-partitions per identifier.

Control-plane tuning

Three dials, typically per-namespace:

  • seconds_per_bucket — width of a time bucket. Smaller for low cardinality (keeps partitions small); larger for high cardinality (fewer partitions to scan on aggregation).
  • buckets_per_id — number of event-bucket hash partitions per identifier per time bucket. Higher for high-throughput single identifiers; lower when throughput is spread across identifiers.
  • seconds_per_slice — width of a time-slice table (e.g. 1 day). Slices govern table-level TTL / compaction and the boundary beyond which an older table can be dropped wholesale.

Netflix's low-cardinality example config: buckets_per_id: 4, seconds_per_bucket: 600 (10 min), seconds_per_slice: 86400 (1 day).

Trade-offs

Pros:

  • Prevents per-identifier wide partitions. Each partition stays small + efficiently compactable.
  • Enables parallel range-scan aggregation — a rollup batch can query all buckets for an identifier concurrently.
  • Per-namespace tuning lets hot + cold workloads coexist on the same physical cluster.

Cons:

  • Reads must scan N partitions per identifier instead of 1. Good aggregation query planners amortise this via parallelism + adaptive batch sizing, but the worst-case concurrent partition read count can overwhelm the store under bursty load.
  • Schema coupling — the right buckets_per_id depends on throughput + cardinality. Miscalibration shows up as either still- wide partitions (too few buckets) or unnecessary read fan-out (too many buckets).
  • Range-query cost scales with bucket count for full-range scans (e.g. historical backfill).

Netflix's Counter-Rollup tier uses dynamic batching to match the bucket count the batch will scan to the cardinality of counters in the batch, preventing the underlying store from seeing a thundering-herd of parallel range reads.

When to use

  • High-throughput time-series events partitioned by a natural identifier key.
  • Cassandra / wide-column / partitioned storage where per-key partition-size limits are a hard operational concern.
  • Aggregation workloads that can query ranges in parallel.

When NOT to use

  • Low-throughput per-identifier workloads — one partition per identifier is fine, the scheme adds needless complexity.
  • Stores that already shard-and-compact horizontally inside a partition (some LSM stores do; pure Cassandra doesn't).

Seen in

Last updated · 319 distilled / 1,201 read