CONCEPT Cited by 1 source

Co-clustered join¶

Definition¶

A co-clustered join is a join optimisation that eliminates data shuffle when both join sides are clustered on the join key. The query planner recognises that physical files on each side already contain co-locatable rows on the join column, so partitions can be joined locally without redistributing data across the cluster's network.

The 2026-06-01 Databricks "Debunking 8 data layout myths" post canonicalises the optimisation as a forward-looking primitive on Liquid Clustering (Private Preview):

"Today, joining Liquid tables on their clustering columns can require a full data shuffle, even when the data is already organized by those columns. Co-clustered joins (now in Private Preview) remove that shuffle automatically. On a real-world data warehousing benchmark, a Liquid-to-Liquid join ran ~51% faster (28 minutes → 14 minutes) and shuffled 87% less data (1.2 TiB → 150 GiB) than the same query without the optimization."

The mechanism¶

Without co-clustered join, a join on a clustered key still incurs the standard shuffle:

Table A (Liquid clustered on customer_id)
Table B (Liquid clustered on customer_id)

Standard execution:
1. Read Table A → repartition by hash(customer_id) → broadcast
2. Read Table B → repartition by hash(customer_id) → broadcast
3. Co-located join across cluster
4. Result: full shuffle of both A and B even though both were
   already organised on customer_id

With co-clustered join, the planner recognises both sides are already organised on the join key:

Table A (Liquid clustered on customer_id)
Table B (Liquid clustered on customer_id)

Co-clustered execution:
1. Read Table A → preserve clustering layout
2. Read Table B → preserve clustering layout
3. Pair up files from A and B whose customer_id ranges overlap
4. Local join per-file-pair
5. Result: ~zero shuffle; only metadata-level coordination

The planner-level requirement: detect that both tables share the same clustering key and the layouts are compatible enough to pair files.

The disclosed performance envelope¶

Metric	Without co-clustered join	With co-clustered join	Improvement
Wall clock	28 minutes	14 minutes	~51% faster
Shuffle data	1.2 TiB	150 GiB	87% reduction

The shuffle reduction is the load-bearing number: shuffle-bound joins at PB scale dominate query cost, so eliminating the shuffle (or reducing it by 87%) eliminates the dominant cost component.

The 51% wall-clock speedup is less than the shuffle reduction because non-shuffle work (file IO, join compute, result materialisation) becomes the new bottleneck once shuffle is eliminated. Standard tail-bottleneck dynamic.

Why this matters¶

Joins are the dominant cost on fact-to-fact joins at scale — two large tables joined on a high-cardinality key (customer_id, order_id, etc.). The classical optimisations don't apply:

Broadcast join (patterns/broadcast-join-for-small-reference-tables) needs one side to fit in memory; doesn't apply when both sides are large.
Sort-merge join is the standard fallback but pays the full shuffle cost.
Co-clustered join specifically targets the case where both sides are clustered on the join key — the case where the data layout already encodes the locality the join needs.

Composition with multi-dimensional clustering¶

The 2026-06-01 source frames co-clustered joins as a multi-dimensional clustering payoff: clustering decisions made to support filter pruning also support join shuffle elimination, provided the join key is among the clustering keys. The architect's column-choice decision compounds across both query patterns.

For a star-schema BI workload — fact table joined to several dimension tables on dimension-key columns — clustering the fact table on the dimension keys (one of the four allowed Liquid clustering keys) enables co-clustered joins to those dimensions when they are also clustered on the same keys.

What this is not¶

Not a join algorithm. Co-clustered join is a planner-level optimisation that elides shuffle ahead of any join algorithm (broadcast, sort-merge, hash). The actual join can still be a hash join or a sort-merge join on the local file-pair.
Not specific to Liquid Clustering. Other systems implement similar optimisations (Spark's bucketed join requires both sides bucketed identically; columnar query engines like Snowflake have micro-partition-aware joins). The 2026-06-01 source is the wiki's Liquid Clustering canonicalisation specifically.
Not free. The optimisation requires the planner to verify layout compatibility and may fall back to standard shuffle if the two tables have incompatible clustering states (one stale, one fresh; different key ordering; etc.).

Failure modes¶

Skewed clustering keys. If the join key has heavy hot keys (one customer with 10× the rows), file-pair joining concentrates load on the hot files. Standard skew mitigations apply.
Stale clustering on one side. If Liquid Clustering hasn't fully laid out one side after recent ingest, the planner falls back to shuffle.
Different clustering key shapes. Co-clustered join assumes both sides cluster on identical keys with compatible layouts; if one side is CLUSTER BY (date, customer_id) and the other is CLUSTER BY (customer_id), the planner may or may not recognise the partial compatibility (source doesn't disclose).

Sibling shuffle-elimination optimisations¶

Bucketed join (Spark) — shared-bucket-count join with pre-shuffled-and-stored layout.
Co-located partitions (Postgres FDW, sharded SQL engines) — shard-key-aligned joins skip cross-shard shuffle.
Z-Order multi-key alignment (limited Z-Order benefit on multi-column joins; usually less effective than dedicated co-clustered join optimisation).

Seen in¶

sources/2026-06-01-databricks-debunking-8-data-layout-myths-why-liquid-clustering-outperfo — First wiki canonicalisation. Forward-looking disclosure as a Private Preview Liquid Clustering primitive. Names the disclosed performance envelope (~51% faster, 87% less shuffle data on a real-world data warehousing benchmark) and frames the mechanism as "removing shuffle that exists today even when data is already organized by the clustering columns". Reserved for future ingests: GA timeline, eligible workload classes, predicate- pushdown interaction, behaviour under skewed clustering keys, layout-compatibility detection algorithm.

systems/liquid-clustering — the substrate the optimisation operates on.
systems/delta-lake — table format.
concepts/multi-dimensional-clustering — the layout property that makes co-clustered joins possible.
concepts/file-level-data-skipping — sibling pruning optimisation on the same per-file statistics; applies to filter predicates, not joins.
patterns/broadcast-join-for-small-reference-tables — the alternative optimisation when one side is small enough to broadcast.