Skip to content

CONCEPT Cited by 1 source

GetSnapshotData O(connections)

Definition

GetSnapshotData is the Postgres internal function that builds the MVCC snapshot a transaction uses to determine which tuples are visible to it. Historically (and in Postgres < 14), GetSnapshotData has O(N) complexity in the number of open connections — it scans the PGPROC array to collect active transaction IDs for building the visibility cutoff.

The implication: every open connection, even an idle one, makes every new transaction in the cluster slightly slower. Connection count becomes a first-class cost dimension, not just a memory / file-descriptor dimension.

Why this is a scaling cliff

  • 1000 idle connections are cheap per-idle-connection but make GetSnapshotData 1000× more work than with one.
  • Short transactions are dominated by snapshot setup, so the relative impact on a busy workload is large.
  • The effect shows up as unexpectedly poor p99 latency at high connection counts — throughput holds but tail degrades.

This is one of the strongest architectural arguments for a connection pooler (PgBouncer, Pgpool-II, etc.) sitting in front of Postgres: the pooler holds thousands of client connections while only a small, tuned number of connections reach Postgres, keeping GetSnapshotData cheap.

Upstream history

  • Postgres 14 (2021) shipped Andres Freund's patches that made GetSnapshotData O(1) in the common case by introducing a cached snapshot and a global xmin array redesigned for scalability. The commit message and associated mailing-list discussion framed this as one of the larger connection-scalability wins in recent Postgres history.
  • Older Postgres versions (12, 13) still carry the O(N) cost. Not every workload hits it at the same connection count, but beyond a few hundred active backends the curve bends.

Seen in

Last updated · 476 distilled / 1,218 read