PATTERN Cited by 1 source
Auto-tuning control loop on storage histograms¶
Definition¶
A background worker that continuously polls per-table partition-size distribution histograms from the storage engine, compares the observed distribution against a configured density target window, and rewrites the partition strategy used for future time slices / future writes when drift is detected. The worker treats partition strategy as a tunable control variable, not a one-shot provisioning decision.
Canonical wiki instance: Netflix's DynamicTimeSliceConfigWorker against Cassandra 4.x's nodetool tablehistograms exposed through a Cassandra virtual table — Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads.
Why this pattern matters¶
Provisioning-time partition strategy is a bet under uncertainty. Three failure modes break it:
- Workload is unknown at provisioning time.
- Workload evolves — even a perfect day-one bet drifts.
- Heterogeneity within a fleet — the same table-design template is applied across thousands of namespaces with different traffic shapes.
Static bucketed event-time partitioning without a control loop produces:
- Over-partitioning: too many tiny partitions (e.g. < 10 KB) → high read amplification, thread queueing, planner overhead.
- Under-partitioning: too few too-large partitions → wide-partition latency, GC pauses.
The control loop detects both and corrects them on a per-table basis.
Mechanism (Netflix TimeSeries instantiation)¶
┌──────────────────────────────────────────────────────────────────┐
│ DynamicTimeSliceConfigWorker (per-namespace) │
│ │
│ Loop every N minutes: │
│ 1. Read Cassandra virtual table that mirrors │
│ `nodetool tablehistograms` for tables in this namespace │
│ 2. For each Time Slice's partition-size percentile dist: │
│ observed = (p50, p99, max) │
│ target = configured density window (typically 2-10 MiB) │
│ 3. If observed.p99 outside target window: │
│ compute adjustment factor │
│ emit Proposed config change for FUTURE slices │
│ 4. Operator (or auto-apply policy) accepts the proposal │
│ → next Time Slice uses new partition strategy │
│ 5. Past slices are NOT rewritten │
└──────────────────────────────────────────────────────────────────┘
The worker output looks like:
DynamicTimeSliceConfigWorker:
namespace: my_dataset_1
Observed: TimeSlices have p99 partitions below configured target of 10MB.
Proposed: time_bucket interval: 60s -> 604800s
The example fixes a 60-second time-bucket configuration that produced sub-10-KB partitions (over-partitioning, read amplification, thread queueing) — by widening the time bucket to a full week (604,800 s), p99 partition size moves into the target window without touching past slices.
Why it's safe to only tune future slices¶
Time Slices in TimeSeries are bounded by time and named accordingly (e.g. data_20260328). The slice's partitioning strategy is captured in slice-creation metadata. Past slices already have a partitioning shape baked in — rewriting them would require a full table rewrite. Future slices haven't been created yet — changing their config is free.
This works because TimeSeries' read API is slice-aware: a query that spans multiple slices issues per-slice reads with each slice's own partitioning shape. The application code does not need to know which slice has which partition strategy.
The pattern is non-applicable in storage shapes where the partition strategy is fixed at table creation and cannot vary across the table's lifetime — e.g. Hive-style partitioned tables on a lakehouse, where re-partitioning requires a full table rewrite (see concepts/over-partitioning).
Sibling: dynamic partition splitting¶
The auto-tuning control loop addresses table-wide drift — when most partitions on a table need different parameters. It does not address per-key outliers — when most partitions are healthy but a few specific IDs accumulate orders-of-magnitude more events.
Per-key outliers are addressed by dynamic partition splitting, which detects pathology on the read path at per-ID granularity and runs an async split-and-divert pipeline.
The two patterns compose:
| Pattern | Granularity | When it fires | Output |
|---|---|---|---|
| This pattern | Per-table-future-slice | Histogram drift outside density target window | New partition strategy for next slice |
| Dynamic partition splitting | Per-ID, runtime | Read on partition exceeds bytes threshold | Split partition into separate target table |
Trade-offs¶
| Pro | Con |
|---|---|
| Tunes itself — no manual operator intervention | Requires storage engine that exposes partition histograms (Cassandra: yes, via virtual tables; many engines: no) |
| Works at table-level — fixes both over- and under-partitioning | Works only when partition strategy can vary across slice / sub-table boundaries |
| Past data is left alone — no rewrite cost | Past data keeps its broken shape (must wait for retention to age it out) |
| Composes with per-ID splitting for outliers | Doesn't help per-ID outliers within a healthy table |
| Density target is a single tunable | Density target itself is a configuration choice (Netflix uses 2–10 MiB depending on workload) |
| Operator can set auto-apply or review-and-apply policy | Auto-apply with a buggy worker can change the partitioning shape across the fleet |
Caveats not disclosed in source¶
- Worker frequency, sensitivity, and hysteresis not specified. The post does not disclose the polling cadence, the adjustment-factor formula, or how the worker avoids oscillating between configurations on workloads near the edge of the target window.
- Auto-apply vs propose-only mode is implied but not stated. The example shows a Proposed config; whether the worker applies it automatically or only emits a recommendation is unclear.
- Failure mode if the histogram virtual table is unavailable. Cassandra virtual tables can be stale or unavailable during cluster events; the worker's behaviour in those windows is not described.
Sibling control-loop patterns¶
| Pattern | Domain | Same shape |
|---|---|---|
| Cluster health check (concept) | Cluster ops | Continuous monitor → corrective action |
| Predictive Optimization (Databricks) | Lakehouse | Auto-applies OPTIMIZE / VACUUM based on table stats — see systems/databricks-predictive-optimization |
| HPA / VPA (Kubernetes) | Pod scaling | Histogram → tunable action |
| JVM GC ergonomics | JVM heap | Pause-time histogram → heap-region resize |
The shared discipline: observable distribution + tunable parameter + control loop. The pattern is generic; what matters is choosing the right histogram and the right parameter.
Seen in¶
- sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads —
Canonical wiki home. Netflix TimeSeries Abstraction's
DynamicTimeSliceConfigWorkeragainst Apache Cassandra 4.x'snodetool tablehistogramsexposed via a virtual table. Density target 2–10 MiB. Example output: 60s → 604800s time-bucket adjustment to fix sub-10 KB partitions (over-partitioning). Past slices not rewritten; only future slices get the new config. "This strategy has yielded real results in reducing our read latencies, as well as reducing the number of timeouts caused by thread queueing."
Related¶
- concepts/over-partitioning — the failure this pattern detects and corrects on Cassandra.
- concepts/wide-partition-problem — the opposite failure (under-partitioning) this pattern also corrects.
- concepts/partition-strategy — the control variable.
- concepts/dynamic-partition-splitting — sibling per-ID remediation.
- systems/netflix-timeseries-abstraction — the canonical instance.
- systems/apache-cassandra — the substrate (with virtual-table histogram support).
- patterns/bucketed-event-time-partitioning — the partitioning shape this pattern tunes.
- patterns/dynamic-partition-split-async-pipeline — sibling per-ID pipeline that handles outliers.