CONCEPT Cited by 1 source
Long Fork anomaly¶
Definition¶
The Long Fork anomaly is a violation of Snapshot Isolation's atomic-visibility property in which two readers on different nodes (or at different points in time) observe two committed, concurrent, non-conflicting transactions in different commit orders. It's called a "fork" because the commit timeline splits — one observer sees T1 happen before T2, the other sees T2 before T1 — and the two histories are internally consistent but jointly incompatible.
Minimal shape:
- Transactions T1 and T2 each modify distinct rows (no write-conflict).
- Reader R1 sees T1's effect but not T2's.
- Reader R2 sees T2's effect but not T1's.
No reader sees both; no reader sees neither with a contradiction — each individual snapshot is "valid"; the violation is that the pair of snapshots cannot be ordered into a single global commit history.
How Postgres exhibits it¶
On a Postgres primary, the order transactions become visible (removal from the in-memory ProcArray) is decoupled from the order they become durable (WAL commit-record write). If T1 and T2 commit concurrently:
- T1 writes its WAL commit record (durable).
- T2 writes its WAL commit record (durable).
- T2 removes itself from
ProcArray(visible to new snapshots). - T1 removes itself from
ProcArray(visible to new snapshots).
A snapshot acquired between steps 3 and 4 sees T2 but not T1. Meanwhile, a read replica applying WAL in commit order sees T1 before T2. Two observers, two different commit orders — a Long Fork. (Source: sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas.)
The generalization of this mechanism is concepts/visibility-order-vs-commit-order.
Per AWS's 2025-05-03 response to Jepsen's RDS Multi-AZ analysis, this affects all Postgres isolation levels (Read Committed, Repeatable Read, Serializable) because all of them take snapshots via ProcArray. It is present in community Postgres itself — discussed on pgsql-hackers since 2013 — not specific to RDS. It does not appear in Single-AZ deployments (no cross-node divergence path) or in Aurora Limitless / systems/aurora-dsql (which replace ProcArray with time-based MVCC).
The Alice-and-Bob illustration¶
AWS's intuition pump from the post:
Page-view counters for Hacker News posts are rows. Alice hits the primary; Bob hits the replica. Alice sees the Jepsen post reach #1 (screenshots it). Bob sees it peak at #2. The commit log confirms a concurrent click on another post briefly overtook. Technically Bob is right (per commit log order) and Alice is right (she saw it). Alice witnessed a state on the primary that per commit order was never supposed to exist — yet her view is fully compliant with formal SI on a standalone node.
The structural point: formal SI says all snapshots compose into a single global commit order. Postgres's implementation does not, at any isolation level.
Why it rarely breaks real apps¶
Most applications naturally serialize operations through app-level constraints or direct row-conflicts that SI catches at commit (first-writer-wins). The Long Fork bites only when the app relies on implicit commit ordering of independent concurrent transactions across nodes. AWS's recommended workarounds until CSN ships (Source: sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas):
- Never rely on implicit commit ordering of independent concurrent transactions.
- Introduce explicit synchronization — shared counters (ticket numbers, queue positions), timestamps (observed-at, execution-time), or database constraints (e.g.
inventory >= 0).
Why it blocks enterprise capabilities even if it doesn't break apps¶
The low-practical-impact framing is specifically about end-user application correctness. The Long Fork is load-bearing against five classes of distributed-Postgres feature (Source: sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas):
- Distributed-SQL systems — impossible to obtain a consistent list of pending transactions across nodes. systems/aurora-limitless and systems/aurora-dsql sidestep by not using
ProcArray-style visibility at all. - Query routing / read-write splitting — offloading reads to caught-up replicas can expose non-repeatable reads to the application.
- Data synchronization — taking a snapshot on the primary and rolling forward via WAL on a replica/target system can land in a state that was never observable on the primary.
- Point-in-time restore to an LSN — can produce a state that was never observable, complicating application-level data-corruption analysis (the restored state didn't correspond to any query-observable reality).
- Storage-layout optimization — replacing tuple xids with logical/clock-based commit times at query execution time makes queries non-repeatable under Long Fork.
Plus a CPU-cost angle: on large Postgres servers with thousands of connections, scanning ProcArray at snapshot time is a measurable fraction of CPU in read-heavy workloads — the visibility mechanism is expensive and wrong.
Fixes and side-steps¶
- Commit Sequence Numbers (CSN) — proposed upstream fix; realign visibility with commit order via a monotonic sequence number. Multi-patch effort, discussed on pgsql-hackers, presented at PGConf.EU 2024.
- Replacing
ProcArraywith time-based MVCC — what systems/aurora-dsql and systems/aurora-limitless do. Only available to implementations with deep Postgres-extension surgery (see patterns/postgres-extension-over-fork).
Relation to other wiki concepts¶
- concepts/snapshot-isolation — the formal property being violated.
- concepts/visibility-order-vs-commit-order — the mechanism.
- concepts/commit-sequence-number — the proposed fix.
Seen in¶
- sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas — AWS's Long Fork explainer in response to Jepsen's RDS-Postgres analysis; worked Alice-and-Bob example; five enterprise-capability impacts; reproducible on self-managed Postgres at all isolation levels; absent in Single-AZ, Aurora Limitless, and Aurora DSQL.