PATTERN Cited by 1 source
REPLACE USING and REPLACE ON for selective overwrite¶
Problem¶
Selective overwrites — "replace the rows matching this predicate with these new rows, atomically" — were historically tied to partitioned table layouts. Databricks' Dynamic Partition Overwrite (legacy) is the canonical instance:
SET spark.sql.sources.partitionOverwriteMode = dynamic;
INSERT OVERWRITE TABLE events
SELECT * FROM staging WHERE date = '2025-01-15'
-- Replaces only the date='2025-01-15' partition in events
-- Requires partitioning by date
-- Requires the Spark config flag
Three structural limitations: - Tied to partitioned tables. Overwrite granularity = partition granularity = whatever column the table is partitioned on. - Requires a Spark config. "requires a Spark config" — opt-in global session state; easy to mis-set; not portable across compute surfaces. - Can't run on all compute surfaces. SQL warehouses, Serverless compute, and other non-classic-Spark execution paths may not support the legacy mechanism uniformly.
The 2026-06-01 Databricks "Debunking 8 data layout myths" post addresses this directly as Myth #8:
Myth: "Being able to selectively overwrite data is only available through Dynamic Partition Overwrites."
Solution¶
Use REPLACE USING and REPLACE ON: layout-agnostic, compute- agnostic SQL primitives that selectively overwrite rows on any column predicate. The 2026-06-01 source:
"Selective overwrites work on Liquid tables natively. Databricks supports REPLACE USING and REPLACE ON, two SQL syntaxes for selectively overwriting data on any data layout: Liquid Clustered, partitioned, or plain unclustered tables. Unlike Dynamic Partition Overwrite which requires a Spark config, REPLACE USING and REPLACE ON can be used on any compute: classic clusters, SQL warehouses, and Serverless. The operation is atomic and matches on any column you choose."
Structural pieces¶
| Piece | What it does |
|---|---|
| REPLACE USING | Selectively overwrites rows matched by a USING (key) clause; semantically a MERGE-flavoured upsert with replacement semantics. |
| REPLACE ON | Selectively overwrites rows matched by an ON predicate clause; matches on any column expression. |
| Layout-agnostic | Works on partitioned, Liquid-clustered, or unclustered tables. |
| Compute-agnostic | Available on classic clusters, SQL warehouses, and Serverless. No Spark config required. |
| Atomic | Operation is committed via the table format's transaction log; partial-overwrite states aren't visible to readers. |
| Any-column matching | Matching predicate can use any column, not just the partition column. |
In practice (illustrative — exact SQL syntax not disclosed in source)¶
-- Replace rows for a specific date
REPLACE INTO events ON date = '2025-01-15'
SELECT * FROM staging WHERE date = '2025-01-15';
-- Or via key matching
REPLACE INTO events USING (event_id)
SELECT * FROM staging WHERE date = '2025-01-15';
-- Both work on any layout: partitioned, clustered, unclustered
-- Both work on any compute: classic, SQL warehouse, Serverless
-- Both atomic: readers see old or new state, not partial
Why this matters¶
Decouples overwrite semantic from layout¶
Pre-REPLACE, the question "can I selectively overwrite?" was answered by "are you partitioned on the right column?". This forced architects to make partitioning decisions based on overwrite ergonomics rather than workload performance — yet another reason teams over-partitioned their tables.
With REPLACE USING / ON, overwrite semantic is available on any layout. The partition vs cluster decision becomes purely about filter / join performance; overwrite ergonomics no longer push the choice toward partitioning.
Removes the Spark-config opt-in¶
spark.sql.sources.partitionOverwriteMode = dynamic is a
session-level config flag. Forgetting to set it produces "replace
the entire table" semantics instead of "replace the matching
partition" — a silently-destructive failure mode. REPLACE USING
/ ON is explicit in the SQL: the scope of the overwrite is in
the query, not in a session config.
Portable across compute surfaces¶
Modern Lakehouse workloads run on multiple compute substrates: classic Spark clusters for ETL, SQL warehouses for BI, Serverless for ad-hoc analysis. Dynamic Partition Overwrite was tied to the classic Spark execution model; REPLACE USING / ON works identically across all three.
Composition with patterns/clustering-keys-as-engine-input¶
REPLACE USING / ON composes with clustering-as-engine-input: the overwrite operation works on the table's layout regardless of what the engine has chosen. Architects can change clustering keys without breaking selective-overwrite pipelines; the SQL surface is stable across layout evolution.
This is one of several layout-agnostic operational primitives that make Liquid Clustering's "layout as implementation detail" thesis operationally viable:
| Primitive | Layout-dependence pre-Liquid | Layout-independence post-Liquid |
|---|---|---|
| Filter pruning | Required matching partition column | concepts/file-level-data-skipping on any column with stats |
| Selective overwrite | Required partition alignment | REPLACE USING / ON on any column |
| Metadata-only DELETE | Required partition alignment | concepts/metadata-only-operation on file-aligned predicates |
| Co-clustered join | Required matching partition columns | concepts/co-clustered-join on clustering keys |
Sibling overwrite patterns on the wiki¶
- MERGE INTO — the broader upsert primitive; supports complex match conditions and multi-action behaviour (UPDATE / DELETE / INSERT in one statement).
- DELETE + INSERT — the manual decomposition; loses atomicity unless wrapped in an explicit transaction.
- OVERWRITE + filter — "DELETE WHERE … then INSERT" expressed as INSERT OVERWRITE … WHERE; legacy pattern; depends on whether the table format supports atomic insert-overwrite-with-filter.
- Dynamic Partition Overwrite — the partition-tied mechanism REPLACE USING / ON supersedes.
REPLACE USING / ON sits between MERGE INTO (general) and Dynamic Partition Overwrite (partition-specific) — it has the layout- agnosticism of MERGE INTO with the simpler "replace these rows" semantic of Dynamic Partition Overwrite.
Seen in¶
- sources/2026-06-01-databricks-debunking-8-data-layout-myths-why-liquid-clustering-outperfo — First wiki canonicalisation as a named pattern. The Myth #8 debunking is the source's positioning: selective overwrite is no longer a partition-only capability. Names the cross-compute portability (classic, SQL warehouses, Serverless) and atomic property explicitly. Reserved for future ingests: exact SQL syntax (the post links to documentation but doesn't reproduce it inline), conflict semantics under concurrent REPLACE USING / ON operations, performance characteristics on large match sets, interaction with row-level concurrency.
Related¶
- systems/liquid-clustering — composes with the layout-agnostic shape.
- systems/delta-lake — table format substrate.
- systems/databricks-sql-warehouses — one of the compute surfaces where REPLACE USING / ON works (where Dynamic Partition Overwrite traditionally didn't).
- concepts/over-partitioning — the failure mode that partition-tied overwrite ergonomics encouraged.
- concepts/idempotency-token — sibling primitive for safe re-execution.
- patterns/clustering-keys-as-engine-input — the broader layout-as-implementation-detail thesis this primitive operationalises.