CONCEPT Cited by 1 source
Co-clustered join¶
Definition¶
A co-clustered join is a join optimisation that eliminates data shuffle when both join sides are clustered on the join key. The query planner recognises that physical files on each side already contain co-locatable rows on the join column, so partitions can be joined locally without redistributing data across the cluster's network.
The 2026-06-01 Databricks "Debunking 8 data layout myths" post canonicalises the optimisation as a forward-looking primitive on Liquid Clustering (Private Preview):
"Today, joining Liquid tables on their clustering columns can require a full data shuffle, even when the data is already organized by those columns. Co-clustered joins (now in Private Preview) remove that shuffle automatically. On a real-world data warehousing benchmark, a Liquid-to-Liquid join ran ~51% faster (28 minutes → 14 minutes) and shuffled 87% less data (1.2 TiB → 150 GiB) than the same query without the optimization."
The mechanism¶
Without co-clustered join, a join on a clustered key still incurs the standard shuffle:
Table A (Liquid clustered on customer_id)
Table B (Liquid clustered on customer_id)
Standard execution:
1. Read Table A → repartition by hash(customer_id) → broadcast
2. Read Table B → repartition by hash(customer_id) → broadcast
3. Co-located join across cluster
4. Result: full shuffle of both A and B even though both were
already organised on customer_id
With co-clustered join, the planner recognises both sides are already organised on the join key:
Table A (Liquid clustered on customer_id)
Table B (Liquid clustered on customer_id)
Co-clustered execution:
1. Read Table A → preserve clustering layout
2. Read Table B → preserve clustering layout
3. Pair up files from A and B whose customer_id ranges overlap
4. Local join per-file-pair
5. Result: ~zero shuffle; only metadata-level coordination
The planner-level requirement: detect that both tables share the same clustering key and the layouts are compatible enough to pair files.
The disclosed performance envelope¶
| Metric | Without co-clustered join | With co-clustered join | Improvement |
|---|---|---|---|
| Wall clock | 28 minutes | 14 minutes | ~51% faster |
| Shuffle data | 1.2 TiB | 150 GiB | 87% reduction |
The shuffle reduction is the load-bearing number: shuffle-bound joins at PB scale dominate query cost, so eliminating the shuffle (or reducing it by 87%) eliminates the dominant cost component.
The 51% wall-clock speedup is less than the shuffle reduction because non-shuffle work (file IO, join compute, result materialisation) becomes the new bottleneck once shuffle is eliminated. Standard tail-bottleneck dynamic.
Why this matters¶
Joins are the dominant cost on fact-to-fact joins at scale — two large tables joined on a high-cardinality key (customer_id, order_id, etc.). The classical optimisations don't apply:
- Broadcast join (patterns/broadcast-join-for-small-reference-tables) needs one side to fit in memory; doesn't apply when both sides are large.
- Sort-merge join is the standard fallback but pays the full shuffle cost.
- Co-clustered join specifically targets the case where both sides are clustered on the join key — the case where the data layout already encodes the locality the join needs.
Composition with multi-dimensional clustering¶
The 2026-06-01 source frames co-clustered joins as a multi-dimensional clustering payoff: clustering decisions made to support filter pruning also support join shuffle elimination, provided the join key is among the clustering keys. The architect's column-choice decision compounds across both query patterns.
For a star-schema BI workload — fact table joined to several dimension tables on dimension-key columns — clustering the fact table on the dimension keys (one of the four allowed Liquid clustering keys) enables co-clustered joins to those dimensions when they are also clustered on the same keys.
What this is not¶
- Not a join algorithm. Co-clustered join is a planner-level optimisation that elides shuffle ahead of any join algorithm (broadcast, sort-merge, hash). The actual join can still be a hash join or a sort-merge join on the local file-pair.
- Not specific to Liquid Clustering. Other systems implement similar optimisations (Spark's bucketed join requires both sides bucketed identically; columnar query engines like Snowflake have micro-partition-aware joins). The 2026-06-01 source is the wiki's Liquid Clustering canonicalisation specifically.
- Not free. The optimisation requires the planner to verify layout compatibility and may fall back to standard shuffle if the two tables have incompatible clustering states (one stale, one fresh; different key ordering; etc.).
Failure modes¶
- Skewed clustering keys. If the join key has heavy hot keys (one customer with 10× the rows), file-pair joining concentrates load on the hot files. Standard skew mitigations apply.
- Stale clustering on one side. If Liquid Clustering hasn't fully laid out one side after recent ingest, the planner falls back to shuffle.
- Different clustering key shapes. Co-clustered join assumes
both sides cluster on identical keys with compatible layouts; if
one side is
CLUSTER BY (date, customer_id)and the other isCLUSTER BY (customer_id), the planner may or may not recognise the partial compatibility (source doesn't disclose).
Sibling shuffle-elimination optimisations¶
- Bucketed join (Spark) — shared-bucket-count join with pre-shuffled-and-stored layout.
- Co-located partitions (Postgres FDW, sharded SQL engines) — shard-key-aligned joins skip cross-shard shuffle.
- Z-Order multi-key alignment (limited Z-Order benefit on multi-column joins; usually less effective than dedicated co-clustered join optimisation).
Seen in¶
- sources/2026-06-01-databricks-debunking-8-data-layout-myths-why-liquid-clustering-outperfo — First wiki canonicalisation. Forward-looking disclosure as a Private Preview Liquid Clustering primitive. Names the disclosed performance envelope (~51% faster, 87% less shuffle data on a real-world data warehousing benchmark) and frames the mechanism as "removing shuffle that exists today even when data is already organized by the clustering columns". Reserved for future ingests: GA timeline, eligible workload classes, predicate- pushdown interaction, behaviour under skewed clustering keys, layout-compatibility detection algorithm.
Related¶
- systems/liquid-clustering — the substrate the optimisation operates on.
- systems/delta-lake — table format.
- concepts/multi-dimensional-clustering — the layout property that makes co-clustered joins possible.
- concepts/file-level-data-skipping — sibling pruning optimisation on the same per-file statistics; applies to filter predicates, not joins.
- patterns/broadcast-join-for-small-reference-tables — the alternative optimisation when one side is small enough to broadcast.