Skip to content

CONCEPT Cited by 1 source

Merge-on-read (MOR)

Merge-on-Read (MOR) is the row-level update strategy in open table formats where mutations are persisted as separate delta files (append-only insert / update / delete records) that sit alongside the immutable base files, and the merge happens at query time — the reader combines base + delta to reconstruct current state. The paired strategy is copy-on-write (COW), which rewrites whole data files on every mutation.

MOR is the default-or-offered strategy on all three canonical open table formats: systems/apache-iceberg, systems/apache-hudi (explicit MERGE_ON_READ table type, as distinct from COPY_ON_WRITE), and systems/delta-lake.

The trade-off in one sentence

MOR trades fast writes for slower reads, then buys back the read performance with periodic compaction that collapses accumulated deltas into read-optimized base files — which is just copy-on-write merge run in the background rather than on the write path.

Why MOR exists

COW gives readers the simplest possible job — open a single set of immutable files, scan them — but pays for that read simplicity on the write path: every mutation rewrites the base files containing the affected rows, with full I/O amplification.

That write amplification is prohibitive for:

  • Change Data Capture (CDC) ingests — a continuous stream of small upserts whose data volume is dwarfed by the file-rewrite cost COW would impose. (Source: sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite)
  • Slowly-changing dimensions (SCD) — classic dimensional-modeling workloads where a dimension table gets continuous row-level edits.
  • Targeted MERGE INTO workloads where the affected row count is a tiny fraction of the table.

MOR sidesteps the write amplification by writing only the delta and deferring the merge.

Mechanics

  • Base files — immutable columnar files (typically Parquet) representing table state as of the last compaction.
  • Delta files — smaller, append-only files recording row-level mutations since that snapshot. Two common shapes: position deletes (delete-by-row-position in a named base file) and equality deletes (delete-by-predicate over column values); inserts are just new data files.
  • Readers merge base + delta at scan time, applying deletes and overlaying updates per the merge key.
  • Compaction periodically rewrites base+delta into a new, smaller set of base files with zero deltas — this is exactly copy-on-write merge running in the background. (Source: sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite)

Performance properties

Write path (Expedia, 2025-09-30):

  • Faster writes — append-only delta files, no base-file rewrite.
  • Reduced I/O — only mutated rows touch disk; object-store operation count drops.
  • Scales with data volume — write cost is proportional to delta size, not table size.

Read path:

  • Baseline read cost rises with delta-file count per touched base file. One or two delta files: negligible overhead. Many: measurable per-scan merge cost.
  • Predicate pushdown still works on base files; deltas have to be consulted to correct the output. Min/max stats on base files stay honest between compactions.
  • Compaction cadence is the primary knob: too rare → query regressions as delta count grows; too frequent → write-side cost creeps back toward COW.

The compaction invariant

"Compaction becomes necessary as the number of delta files grows to maintain optimal performance." (Source: sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite)

MOR is only a cost win averaged over the compaction window. A MOR system with broken or disabled compaction degrades into an unbounded delta log and loses all read-side properties. Every production MOR deployment therefore ships (or adopts) a compactor — Iceberg's rewrite_data_files action, Hudi's Compactor service, Delta's OPTIMIZE, or — at exabyte scale — something like Amazon BDT's systems/deltacat Flash Compactor on systems/ray (see concepts/copy-on-write-merge).

When to use which

Workload Strategy
Full-partition refresh INSERT OVERWRITE
Large-batch updates touching most rows of many files COW
CDC / SCD / incremental upsert MOR + MERGE INTO
Targeted MERGE INTO with natural key MOR
Read-heavy table, rare updates COW
Write-heavy table, frequent updates MOR (+ aggressive compaction)

(See also patterns/merge-into-over-insert-overwrite.)

Seen in

Last updated · 200 distilled / 1,178 read