PATTERN Cited by 1 source
Incremental clustering on write¶
Problem¶
Periodic full-table re-clustering produces unbounded
write amplification at scale. The
canonical instance is Z-Ordering: each
OPTIMIZE ZORDER BY rewrites entire files (or partitions worth of
files) including data that was already correctly clustered before
the new ingest. As the table grows and the new-data fraction
shrinks against table size, the rewrite cost grows linearly with
table size on every maintenance run.
The 2026-06-01 Databricks "Debunking 8 data layout myths" post diagnoses this verbatim:
"Z-Order has to be rerun periodically as new data lands, and each rerun rewrites large amounts of old, possibly already-clustered data to restore clustering quality. With continuous ingestion, the cost of keeping data well-clustered with Z-Order grows along with the table."
The structural property of the periodic-rewrite approach: rewrite cost is decoupled from new-data volume and coupled to table size. Predictable failure mode: at sufficient scale, the rewrite cycle takes longer than the inter-write interval and the clustering quality degrades indefinitely.
Solution¶
Maintain clustering layout incrementally on the write path. New writes are placed into files that preserve locality on the clustering keys; periodic compaction operates on small, recent file sets rather than the whole table; the rewrite cost stays proportional to new-data volume, not table size.
The 2026-06-01 source's framing:
"Liquid clusters incrementally, including at write time, so the layout stays optimal without unnecessary rewrites."
Structural pieces¶
| Piece | What it does |
|---|---|
| Write-path layout | Newly written files honour the table's clustering keys at the moment of write, using locality-aware file placement and intra-file sort. |
| Incremental compaction | Background OPTIMIZE operates on recent / unbalanced subsets of files; not the whole table. |
| Layout-state tracking | Transaction log records which files are well-clustered vs. need-rebalancing; planner / OPTIMIZE both consume this. |
| No periodic full-rewrite | The layout doesn't require periodic table-wide rebuilding to maintain quality. |
In practice¶
Day 1: Write 1 GB. New files clustered on (date, customer_id).
Day 2: Write 1 GB. New files clustered on (date, customer_id).
Background OPTIMIZE merges fragmented files within the
recent batch. Cost: small, proportional to new data.
Day 365: Table is 365 GB.
Background OPTIMIZE still operates on recent batches and
fragmentation hotspots. Cost: small, proportional to
new data per cycle.
Compare with periodic full-rewrite:
Day 365: Background OPTIMIZE ZORDER must re-cluster all 365 GB
to reincorporate Day 365's writes into the global
Z-order. Cost: O(table size).
Cost-decoupling property¶
The pattern's load-bearing economic property: maintenance cost per unit time is approximately constant as the table grows.
Periodic-rewrite (Z-Order) cost over time:
cost(n) = O(table_size(n)) per cycle
≈ O(n) for steady ingest
→ quadratic total cost over table lifetime
Incremental-on-write (Liquid) cost over time:
cost(n) = O(new_data(n)) per cycle
≈ O(1) for steady ingest
→ linear total cost over table lifetime
The difference matters at PB scale: a 1 PB table maintained via periodic rewrite pays maintenance cost roughly proportional to 1 PB on every cycle. The same table with incremental clustering pays maintenance cost roughly proportional to the daily / hourly new-data volume — orders of magnitude less.
Sibling failure modes the pattern avoids¶
| Failure mode | Periodic rewrite | Incremental on-write |
|---|---|---|
| Maintenance cost > inter-write interval | Layout quality degrades indefinitely | Maintenance keeps up with ingest |
| Storage cost spike during rewrite | 2× table size during rewrite (old + new files coexist) | Bounded by new-write volume |
| OPTIMIZE blocks readers | Long-running OPTIMIZE has wide windows of incidental impact | Short, frequent operations |
| Fragmentation between maintenance cycles | Fragmentation accumulates until next cycle | Continuously bounded |
Composition with managed-table substrate¶
The pattern composes with automatic table optimization: the substrate ( Predictive Optimization) decides when to run incremental compaction based on observed write patterns and clustering-state telemetry. The user declares clustering keys (patterns/clustering-keys-as-engine-input); the substrate owns the maintenance schedule.
When this doesn't apply¶
- Append-only logs with no clustering goals — the clustering layer is unnecessary; raw append plus periodic compaction is sufficient.
- Tables with infrequent writes — periodic rewrite cost is tolerable when writes are rare (monthly batch loads).
- Legacy tools that require periodic table-wide rebuilds — some downstream consumers may snapshot tables on a periodic cadence and re-read everything; incremental layout doesn't help if the consumer reads the whole table anyway.
Sibling patterns on the wiki¶
- Lazy compaction (LSM tier-merge) — same principle at the LSM-tree storage layer; merging happens in tiers as data ages, not as periodic full-table rewrites.
- Multi-strategy compaction (patterns/multi-strategy-compaction) — Magic Pocket's L1 / L2 / L3 sequence; each tier handles a different cost-vs-effectiveness trade-off in compaction.
- Background reconciler for read-path optimization (patterns/background-reconciler-for-read-path-optimization) — sibling shape on streaming brokers: write-path produces unoptimised files, reconciler produces read-optimised files in the background.
The shared principle: bound maintenance cost to incremental work, not table state.
Seen in¶
- sources/2026-06-01-databricks-debunking-8-data-layout-myths-why-liquid-clustering-outperfo — First wiki canonicalisation as a named pattern. The Z-Ordering critique ("unnecessary rewrites... the cost of keeping data well-clustered with Z-Order grows along with the table") and the Liquid Clustering contrast ("Liquid clusters incrementally, including at write time, so the layout stays optimal without unnecessary rewrites") make the pattern load-bearing for the post's economic case at PB scale. Reserved for future ingests: the precise telemetry that triggers incremental compaction, the algorithmic difference between Liquid's incremental layout and competing approaches, and the worst-case behaviour under high-velocity write bursts.
Related¶
- systems/liquid-clustering — canonical instance.
- systems/delta-lake — table format substrate.
- systems/databricks-predictive-optimization — runs the incremental compaction work.
- concepts/z-ordering — the periodic-rewrite predecessor this pattern supersedes.
- concepts/write-amplification — the cost dimension this pattern bounds.
- concepts/automatic-table-optimization — the substrate-side property that decides incremental compaction scheduling.
- concepts/over-partitioning — sibling failure mode that emerges when teams pre-commit to fixed layouts.
- patterns/clustering-keys-as-engine-input — the broader abstraction this maintenance discipline sits beneath.