Skip to content

CONCEPT Cited by 1 source

Long Fork anomaly

Definition

The Long Fork anomaly is a violation of Snapshot Isolation's atomic-visibility property in which two readers on different nodes (or at different points in time) observe two committed, concurrent, non-conflicting transactions in different commit orders. It's called a "fork" because the commit timeline splits — one observer sees T1 happen before T2, the other sees T2 before T1 — and the two histories are internally consistent but jointly incompatible.

Minimal shape:

  • Transactions T1 and T2 each modify distinct rows (no write-conflict).
  • Reader R1 sees T1's effect but not T2's.
  • Reader R2 sees T2's effect but not T1's.

No reader sees both; no reader sees neither with a contradiction — each individual snapshot is "valid"; the violation is that the pair of snapshots cannot be ordered into a single global commit history.

How Postgres exhibits it

On a Postgres primary, the order transactions become visible (removal from the in-memory ProcArray) is decoupled from the order they become durable (WAL commit-record write). If T1 and T2 commit concurrently:

  1. T1 writes its WAL commit record (durable).
  2. T2 writes its WAL commit record (durable).
  3. T2 removes itself from ProcArray (visible to new snapshots).
  4. T1 removes itself from ProcArray (visible to new snapshots).

A snapshot acquired between steps 3 and 4 sees T2 but not T1. Meanwhile, a read replica applying WAL in commit order sees T1 before T2. Two observers, two different commit orders — a Long Fork. (Source: sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas.)

The generalization of this mechanism is concepts/visibility-order-vs-commit-order.

Per AWS's 2025-05-03 response to Jepsen's RDS Multi-AZ analysis, this affects all Postgres isolation levels (Read Committed, Repeatable Read, Serializable) because all of them take snapshots via ProcArray. It is present in community Postgres itself — discussed on pgsql-hackers since 2013 — not specific to RDS. It does not appear in Single-AZ deployments (no cross-node divergence path) or in Aurora Limitless / systems/aurora-dsql (which replace ProcArray with time-based MVCC).

The Alice-and-Bob illustration

AWS's intuition pump from the post:

Page-view counters for Hacker News posts are rows. Alice hits the primary; Bob hits the replica. Alice sees the Jepsen post reach #1 (screenshots it). Bob sees it peak at #2. The commit log confirms a concurrent click on another post briefly overtook. Technically Bob is right (per commit log order) and Alice is right (she saw it). Alice witnessed a state on the primary that per commit order was never supposed to exist — yet her view is fully compliant with formal SI on a standalone node.

The structural point: formal SI says all snapshots compose into a single global commit order. Postgres's implementation does not, at any isolation level.

Why it rarely breaks real apps

Most applications naturally serialize operations through app-level constraints or direct row-conflicts that SI catches at commit (first-writer-wins). The Long Fork bites only when the app relies on implicit commit ordering of independent concurrent transactions across nodes. AWS's recommended workarounds until CSN ships (Source: sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas):

  1. Never rely on implicit commit ordering of independent concurrent transactions.
  2. Introduce explicit synchronization — shared counters (ticket numbers, queue positions), timestamps (observed-at, execution-time), or database constraints (e.g. inventory >= 0).

Why it blocks enterprise capabilities even if it doesn't break apps

The low-practical-impact framing is specifically about end-user application correctness. The Long Fork is load-bearing against five classes of distributed-Postgres feature (Source: sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas):

  1. Distributed-SQL systems — impossible to obtain a consistent list of pending transactions across nodes. systems/aurora-limitless and systems/aurora-dsql sidestep by not using ProcArray-style visibility at all.
  2. Query routing / read-write splitting — offloading reads to caught-up replicas can expose non-repeatable reads to the application.
  3. Data synchronization — taking a snapshot on the primary and rolling forward via WAL on a replica/target system can land in a state that was never observable on the primary.
  4. Point-in-time restore to an LSN — can produce a state that was never observable, complicating application-level data-corruption analysis (the restored state didn't correspond to any query-observable reality).
  5. Storage-layout optimization — replacing tuple xids with logical/clock-based commit times at query execution time makes queries non-repeatable under Long Fork.

Plus a CPU-cost angle: on large Postgres servers with thousands of connections, scanning ProcArray at snapshot time is a measurable fraction of CPU in read-heavy workloads — the visibility mechanism is expensive and wrong.

Fixes and side-steps

Relation to other wiki concepts

Seen in

Last updated · 200 distilled / 1,178 read