CONCEPT Cited by 1 source
Visibility order vs. commit order¶
Definition¶
Visibility order vs. commit order names the architectural decision in an MVCC database about whether the sequence in which committed transactions become readable by new snapshots is the same as the sequence in which they became durable in the write-ahead log.
The two are not automatically equal. Durability means "the commit record has been fsync'd to WAL stable storage, the client has been acked." Visibility means "a new snapshot taken right now will include this transaction's effects." Making durability and visibility atomic with respect to each other requires cross-cutting synchronization that adds cost to the hot commit path — and Postgres, for historical reasons, chose not to pay that cost.
When the two orders diverge, the database admits the Long Fork anomaly: two readers can observe the same pair of concurrent committed transactions in different orders, violating Snapshot Isolation's atomic-visibility requirement.
How Postgres decouples them¶
Postgres's commit path (simplified):
- Transaction writes its WAL commit record → fsync → ack client (durable).
- Transaction asynchronously removes its xid from the in-memory
ProcArray(visible).
Step 1 determines commit order. Step 2 determines visibility order. A new snapshot scans ProcArray to learn which xids are still in-flight; those xids and everything they produced are permanently excluded from the snapshot even after they commit. So the window between steps 1 and 2 is where visibility and commit orders can skew:
- T1 writes WAL commit record.
- T2 writes WAL commit record.
- T2 removes itself from
ProcArrayfirst. - A reader takes a snapshot here → sees T2, not T1.
- T1 removes itself from
ProcArray. - Another reader takes a snapshot → sees both.
A replica applying WAL in commit order sees T1 before T2. The primary's ProcArray can order them T2-then-T1. Two observers, two orders — a Long Fork.
Per AWS's 2025-05-03 response to Jepsen's RDS Multi-AZ analysis (Source: sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas):
"On a PostgreSQL primary (in both standalone and replicated configurations), the order in which the effects of non-conflicting transactions become visible might deviate from the order in which they become durable."
This affects all isolation levels (Read Committed / Repeatable Read / Serializable) because all of them acquire snapshots via ProcArray.
Why Postgres made this choice¶
The decoupling is old — pgsql-hackers has discussed it since at least 2013. The tradeoff it takes:
Kept: a cheap commit hot path. Removing-from-ProcArray need not be synchronous with WAL fsync; commit latency is essentially the fsync cost.
Given up: atomic visibility. Multiple concurrent commits can interleave their ProcArray removals arbitrarily relative to WAL order.
For a standalone, single-snapshot-source Postgres node this rarely matters — nobody can tell the difference. For a clustered Postgres or any query-pattern that observes the primary from multiple angles (replica read, snapshot-then-replay, cross-node analytical query, PITR to LSN), the decoupling surfaces as the Long Fork.
Architectural answers¶
Three shapes of answer, in order of invasiveness:
| Answer | Cost | Where seen |
|---|---|---|
| Application-level synchronization — explicit ordering via shared counters, timestamps, row-conflict constraints | App complexity | Recommended workaround until the fix ships (Source: sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas) |
| CSN — make visibility-by-CSN the snapshot mechanism | Multi-patch upstream Postgres change; some commit-path synchronization | Proposed upstream; PGConf.EU 2024 talk; AWS's PostgreSQL Contributors Team participating |
Replace ProcArray with time-based MVCC — visibility by clock, not by in-flight-xid list |
Full replacement of the visibility substrate via Postgres extension | systems/aurora-dsql, systems/aurora-limitless |
The ProcArray-scan cost also shows up as CPU waste — at thousands of connections and read-heavy workloads it's a "measurable fraction of CPU," so any fix also frees up a perf budget.
Relation to other wiki concepts¶
- concepts/wal-write-ahead-logging — commit order = the order commit records land in WAL. Durable, append-only, fsync'd.
- concepts/postgres-mvcc-hot-updates — MVCC's tuple-version machinery is what snapshots read against.
ProcArrayis the index that tells snapshots which tuples are valid for them. - concepts/snapshot-isolation — the isolation model that formally requires visibility order == commit order across observers.
- concepts/long-fork-anomaly — the anomaly when they diverge.
- concepts/commit-sequence-number — the proposed upstream mechanism to align them.
Seen in¶
- sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas — AWS's precise framing of Postgres's architectural choice: visibility order "might deviate" from durable commit order, affecting all isolation levels. Time-based MVCC in Aurora DSQL / Aurora Limitless as the replacement-substrate answer; CSN as the upstream-fix answer.