CONCEPT Cited by 1 source

GetSnapshotData O(connections)¶

Definition¶

GetSnapshotData is the Postgres internal function that builds the MVCC snapshot a transaction uses to determine which tuples are visible to it. Historically (and in Postgres < 14), GetSnapshotData has O(N) complexity in the number of open connections — it scans the PGPROC array to collect active transaction IDs for building the visibility cutoff.

The implication: every open connection, even an idle one, makes every new transaction in the cluster slightly slower. Connection count becomes a first-class cost dimension, not just a memory / file-descriptor dimension.

Why this is a scaling cliff¶

1000 idle connections are cheap per-idle-connection but make GetSnapshotData 1000× more work than with one.
Short transactions are dominated by snapshot setup, so the relative impact on a busy workload is large.
The effect shows up as unexpectedly poor p99 latency at high connection counts — throughput holds but tail degrades.

This is one of the strongest architectural arguments for a connection pooler (PgBouncer, Pgpool-II, etc.) sitting in front of Postgres: the pooler holds thousands of client connections while only a small, tuned number of connections reach Postgres, keeping GetSnapshotData cheap.

Upstream history¶

Postgres 14 (2021) shipped Andres Freund's patches that made GetSnapshotData O(1) in the common case by introducing a cached snapshot and a global xmin array redesigned for scalability. The commit message and associated mailing-list discussion framed this as one of the larger connection-scalability wins in recent Postgres history.
Older Postgres versions (12, 13) still carry the O(N) cost. Not every workload hits it at the same connection count, but beyond a few hundred active backends the curve bends.

Seen in¶

sources/2020-06-23-zalando-pgbouncer-on-kubernetes-minimal-latency — Zalando's Postgres Operator team cites GetSnapshotData's complexity as one of the two canonical reasons for needing a connection pooler — alongside the process-per- connection memory / context-switch overhead. Kukushkin links directly to Andres Freund's 2020 thread noting "GetSnapshotData has O(connections) complexity" — the thread that led to the Postgres 14 fix.

systems/postgresql · systems/pgbouncer — the pooler that sidesteps the cost.
concepts/process-per-connection-postgres — the sister connection-cost dimension.
concepts/snapshot-isolation — the isolation model GetSnapshotData implements.

GetSnapshotData O(connections)¶

Definition¶

Why this is a scaling cliff¶

Upstream history¶

Seen in¶

Related¶