CONCEPT Cited by 1 source

Row-level concurrency¶

Definition¶

Row-level concurrency is the table-format property whereby two writers updating different rows of the same table do not conflict even if those rows live in the same data file. The concurrency unit is the row, not the file. Contrast with file-level concurrency, the older model where two writers touching any rows in the same file conflict and one must retry.

The 2026-06-01 Databricks "Debunking 8 data layout myths" post canonicalises the distinction verbatim:

"Operating at partition granularity was a workaround for an older concurrency model. Unlike partitioning which only has file-level concurrency, Liquid provides row-level concurrency. Two writers updating different rows no longer conflict, even if those rows live in the same file. This removes one of the main reasons teams partitioned tables: maintaining write boundaries to avoid serialization."

This is one of the structural reasons partitioning was favoured even when its data-layout consequences were undesirable: by giving each writer its own partition, teams could ensure two writers never touched the same files and therefore never serialised. With row-level concurrency, that workaround is no longer necessary.

The mechanism¶

The 2026-06-01 Databricks source describes the property at behavioural level ("two writers updating different rows no longer conflict") but does not disclose the implementation. The natural implementation on a Delta-style transaction-log substrate uses fine-grained conflict detection at commit time:

Writer A:   read file F → modify rows R1, R3 → write new file F'A
Writer B:   read file F → modify rows R2, R4 → write new file F'B

Commit time, transaction log evaluates:
- Did A and B touch overlapping row sets within F?
- If no → both commits succeed (F'A and F'B both replace F)
- If yes → second writer must retry against the new file state

The key engineering work is detecting non-overlap at row granularity without forcing both writers to lock the underlying file during their write phase. This is typically achieved through a combination of:

Per-row identity tracking (row tracking in Iceberg v3 / Delta) — assigning stable row IDs so conflicting writes can be detected at row granularity.
Optimistic concurrency control — both writers proceed assuming no conflict; conflict detection happens only at commit.
Deletion vectors (concepts/deletion-vector) — rows are "removed" by adding entries to a deletion vector rather than rewriting the file, making it cheaper to apply non-overlapping modifications without rewriting shared data.

Liquid Clustering (and modern Delta in general) composes these primitives to deliver row-level concurrency as a default property.

What this enables¶

Concurrent ETL on the same table¶

The most direct application: two ETL pipelines targeting the same table for different row subsets no longer need partition coordination.

Pipeline A:  upserts rows where region = 'us-east'
Pipeline B:  upserts rows where region = 'us-west'

File F contains rows from both regions.

File-level concurrency: A and B conflict; one retries; throughput
                        is bounded by the slowest pipeline.

Row-level concurrency:  A and B both commit; throughput scales with
                        pipeline parallelism.

The Databricks source frames this as the architectural release: "With Liquid Clustering, ETL can easily operate concurrently against the same table" — the partition-as-write-boundary discipline is no longer necessary.

Concurrent compaction during writes¶

Background compaction (OPTIMIZE, VACUUM) can run while writes are in flight. With file-level concurrency, compaction had to schedule around writers; with row-level concurrency, compaction can rewrite files as long as it doesn't conflict with concurrent row-level modifications.

CDC consumer parallelism¶

CDC pipelines consuming from a source table can write into multiple target tables in parallel, even when those targets share underlying files, without serialising on the file boundary.

The myth defenders cite¶

Defenders of partitioning often cite the "two pipelines on the same unpartitioned table will conflict" concern as a reason to partition. The 2026-06-01 Databricks post addresses this directly (Myth #6):

Myth: "Concurrent ETL needs write boundaries. Without partitioning, two writers updating the same table risk colliding, and Delta/ Iceberg concurrency control forces one of them to retry or fail. Partition and give each writer its own slice of the table, so two pipelines never touch the same files."

The reality, per the source, is that the partition-as-write-boundary pattern was a workaround for file-level concurrency — "Operating at partition granularity was a workaround for an older concurrency model" — and is no longer needed once the underlying concurrency model is row-level.

Composition with clustering¶

Row-level concurrency is decoupled from clustering layout — the property is a transaction-log / format-level capability, not a clustering-specific feature. However, the source ties it to Liquid Clustering because:

Liquid Clustering removes the original reason teams adopted partitioning (data-layout granularity for filter performance).
Row-level concurrency removes the secondary reason teams adopted partitioning (write-boundary coordination).
With both removed, partitioning's structural costs (concepts/over-partitioning) lose their counterweight.

When file-level concurrency would still apply¶

The source does not enumerate workloads where row-level concurrency breaks down, but architectural reasoning suggests:

Two writers modifying the same row simultaneously still conflict (the row is the conflict unit).
Writers and schema-change operations may still serialise at the table-format level (schema changes touch all files).
DDL operations (TRUNCATE, drop columns, change clustering keys) remain serialising.

These are out-of-scope for the "two writers updating different rows" common case the source addresses.

Sibling concepts¶

Sibling	Domain	Concurrency unit
File-level concurrency (older Delta / Iceberg)	OTF before row-level features	File
Row-level concurrency (modern Delta / Iceberg v3)	OTF with deletion vectors + row tracking	Row
Optimistic concurrency control	OLTP	Transaction (logical)
MVCC (concepts/snapshot-isolation)	OLTP	Transaction with snapshots

The trajectory is toward finer concurrency granularity: from table-level (one writer at a time) to file-level (one writer per file) to row-level (one writer per row). Each step expands the concurrent-write envelope.

Seen in¶

sources/2026-06-01-databricks-debunking-8-data-layout-myths-why-liquid-clustering-outperfo — First wiki canonicalisation as a named property. The verbatim framing — that partition-as-write-boundary was a workaround for file-level concurrency, and Liquid Clustering's row-level concurrency removes the workaround — is the load-bearing architectural claim. Concrete-mechanism (deletion vectors, row tracking) inferred but not disclosed in this source.

systems/liquid-clustering — first system to canonicalise the property.
systems/delta-lake — transaction log substrate.
systems/apache-iceberg — sibling format with similar capabilities via deletion vectors + row tracking.
concepts/over-partitioning — the failure mode partition-as- write-boundary discipline produced.
concepts/deletion-vector — likely implementation primitive.
concepts/row-tracking — likely implementation primitive.
patterns/clustering-keys-as-engine-input — the broader inversion: layout-as-implementation-detail at the format level.