CONCEPT Cited by 1 source

Postgres checkpoint¶

Definition¶

A Postgres checkpoint is a periodic background cleanup event that flushes all modified ("dirty") pages in the buffer pool to disk up to a specific point in the WAL, then records a milestone marker in the WAL indicating "database state up to this log position is fully persisted on disk."

From the Databricks Lakebase 2026-05-07 post, verbatim:

Unlike a snapshot, a checkpoint is simply a milestone marker in the log. During a checkpoint, Postgres takes all the modified data currently in memory (managed in 8KB chunks called "pages") and flushes it to the main disk, up to a specific point in the log. If a crash happens, Postgres restores your data by starting at that checkpoint milestone and replaying the recent WAL logs over the disk.

(Source: sources/2026-05-07-databricks-how-lakebase-architecture-delivers-5x-faster-postgres-writes)

Purpose¶

Bound recovery replay time. Without checkpoints, crash recovery would have to replay the entire WAL from database start. Checkpoints cap replay length to "WAL written since the most recent checkpoint." Default checkpoint_timeout is 5 minutes; under write-heavy workloads more frequent checkpoint_completion_target triggers can apply.
Reclaim WAL space. WAL segments that predate the oldest needed recovery position can be recycled or removed.
Scope Full Page Writes. Each checkpoint resets the "first modification since last checkpoint" flag on every page; the next modification of each page within the new interval pays the FPW tax (copies the whole 8 KB page into WAL) to enable torn-page-safe recovery.

Checkpoint ≠ snapshot¶

Databricks explicitly calls out this distinction:

Unlike a snapshot, a checkpoint is simply a milestone marker in the log.

A snapshot produces a distinct durable artefact (a copy of the data at a point in time) from which you could independently restore. A checkpoint is just a pointer in the WAL saying "before here is guaranteed-on-disk; replay starts here." If you lose the disk after a checkpoint, you lose the database — the checkpoint is not a recovery artefact on its own.

The incidental side-effect role in Neon / Lakebase architecture¶

FPW's stated purpose is torn-page recovery, but it has an incidental side-effect role: the periodic full page images in WAL act as reset points in the delta chain the Neon-lineage pageserver uses to reconstruct pages on reads. Without them, the delta chain grows unboundedly and read latency degrades.

This means checkpoint cadence was doing double duty in classical Postgres:

Stated purpose: bound crash-recovery replay length (write-path).
Side-effect purpose: bound read-time delta-chain replay length (read-path on separated storage).

The Databricks 2026-05-07 post canonicalises the architectural insight that these two cadences should be decoupled:

Crash-recovery bounding is still useful (and cheap, because checkpoints are lightweight on separated-storage architectures where compute has no local-disk page heap to flush).
Read-path delta-chain-reset should be driven by actual page-change rate, not by the unrelated checkpoint cadence. Image-generation pushdown lets the pageserver generate images "when a page has accumulated more delta records than a configured threshold without an intervening image" — per-page decisions instead of global checkpoint sweeps.

Operational levers in classical Postgres¶

checkpoint_timeout — maximum time between checkpoints (default 5 min).
max_wal_size — threshold WAL volume that also triggers a checkpoint.
checkpoint_completion_target — fraction of the interval used to spread out writes (reduces write spike at checkpoint time).
full_page_writes — whether FPW is enabled (default on; safe to disable only on storage stacks that eliminate torn-page risk, or on architectures that eliminate the local-disk page entirely).

Seen in¶

sources/2026-05-07-databricks-how-lakebase-architecture-delivers-5x-faster-postgres-writes — canonical first-class wiki framing of Postgres checkpoint with the "milestone marker in the log, not a snapshot" distinction. Databricks decouples the checkpoint cadence (which still governs crash-recovery replay length) from read-path delta-chain-reset cadence (which moves to per-page-threshold decisions made by the pageserver). The XLOG_FPW_CHANGE Postgres control record is the mechanism for switching FPW behaviour live without customer restart.

concepts/postgres-full-page-write — the per-page write-amplification cost triggered at each checkpoint.
concepts/torn-page — the failure mode FPW exists to prevent; checkpoint cadence governs FPW cadence.
concepts/delta-chain-replay — the read-path primitive whose bounding was a side-effect of checkpoint-scoped FPW; image-generation pushdown decouples this.
concepts/wal-record-granularity — WAL includes control records like checkpoints and XLOG_FPW_CHANGE alongside data records.
systems/postgresql — upstream DB engine where checkpoint is the primitive.
systems/lakebase — canonical instance of checkpoint-cadence and delta-chain-reset cadence being decoupled via image-generation pushdown.
systems/pageserver-safekeeper — the storage-tier components that now make per-page reset-cadence decisions independent of the compute's checkpoint cadence.
patterns/image-generation-pushdown-to-storage — the architectural mechanism for the decoupling.