Skip to content

CONCEPT Cited by 1 source

Visibility order vs. commit order

Definition

Visibility order vs. commit order names the architectural decision in an MVCC database about whether the sequence in which committed transactions become readable by new snapshots is the same as the sequence in which they became durable in the write-ahead log.

The two are not automatically equal. Durability means "the commit record has been fsync'd to WAL stable storage, the client has been acked." Visibility means "a new snapshot taken right now will include this transaction's effects." Making durability and visibility atomic with respect to each other requires cross-cutting synchronization that adds cost to the hot commit path — and Postgres, for historical reasons, chose not to pay that cost.

When the two orders diverge, the database admits the Long Fork anomaly: two readers can observe the same pair of concurrent committed transactions in different orders, violating Snapshot Isolation's atomic-visibility requirement.

How Postgres decouples them

Postgres's commit path (simplified):

  1. Transaction writes its WAL commit record → fsync → ack client (durable).
  2. Transaction asynchronously removes its xid from the in-memory ProcArray (visible).

Step 1 determines commit order. Step 2 determines visibility order. A new snapshot scans ProcArray to learn which xids are still in-flight; those xids and everything they produced are permanently excluded from the snapshot even after they commit. So the window between steps 1 and 2 is where visibility and commit orders can skew:

  • T1 writes WAL commit record.
  • T2 writes WAL commit record.
  • T2 removes itself from ProcArray first.
  • A reader takes a snapshot here → sees T2, not T1.
  • T1 removes itself from ProcArray.
  • Another reader takes a snapshot → sees both.

A replica applying WAL in commit order sees T1 before T2. The primary's ProcArray can order them T2-then-T1. Two observers, two orders — a Long Fork.

Per AWS's 2025-05-03 response to Jepsen's RDS Multi-AZ analysis (Source: sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas):

"On a PostgreSQL primary (in both standalone and replicated configurations), the order in which the effects of non-conflicting transactions become visible might deviate from the order in which they become durable."

This affects all isolation levels (Read Committed / Repeatable Read / Serializable) because all of them acquire snapshots via ProcArray.

Why Postgres made this choice

The decoupling is old — pgsql-hackers has discussed it since at least 2013. The tradeoff it takes:

Kept: a cheap commit hot path. Removing-from-ProcArray need not be synchronous with WAL fsync; commit latency is essentially the fsync cost.

Given up: atomic visibility. Multiple concurrent commits can interleave their ProcArray removals arbitrarily relative to WAL order.

For a standalone, single-snapshot-source Postgres node this rarely matters — nobody can tell the difference. For a clustered Postgres or any query-pattern that observes the primary from multiple angles (replica read, snapshot-then-replay, cross-node analytical query, PITR to LSN), the decoupling surfaces as the Long Fork.

Architectural answers

Three shapes of answer, in order of invasiveness:

Answer Cost Where seen
Application-level synchronization — explicit ordering via shared counters, timestamps, row-conflict constraints App complexity Recommended workaround until the fix ships (Source: sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas)
CSN — make visibility-by-CSN the snapshot mechanism Multi-patch upstream Postgres change; some commit-path synchronization Proposed upstream; PGConf.EU 2024 talk; AWS's PostgreSQL Contributors Team participating
Replace ProcArray with time-based MVCC — visibility by clock, not by in-flight-xid list Full replacement of the visibility substrate via Postgres extension systems/aurora-dsql, systems/aurora-limitless

The ProcArray-scan cost also shows up as CPU waste — at thousands of connections and read-heavy workloads it's a "measurable fraction of CPU," so any fix also frees up a perf budget.

Relation to other wiki concepts

Seen in

Last updated · 200 distilled / 1,178 read