Skip to content

CONCEPT Cited by 1 source

Postgres Full Page Write

Definition

Full Page Write (FPW) is the Postgres durability primitive that copies the entire 8 KB page into the WAL the first time a page is modified after a checkpoint. Subsequent modifications of that page within the same checkpoint-interval log only the delta.

FPW exists to protect against torn pages: if a crash happens mid-write of an 8 KB page, the on-disk copy may be partially new and partially old. Replaying a small WAL delta over a torn page would produce permanently-ruined data. FPW ensures that immediately after a checkpoint, WAL contains a known-good full image of any page being modified — so recovery can ignore the possibly-torn on-disk page and replay from the WAL-resident image instead.

The write-amplification cost

FPW trades WAL-volume for torn-page-safety. Per the Databricks Lakebase 2026-05-07 post, on write-heavy workloads FPW can inflate log volume by up to 15×, "often becoming the system's biggest performance bottleneck."

Mechanism: checkpoints happen periodically (default ~5 minutes), and every page modified for the first time after a checkpoint pays the 8 KB FPW tax. Heavy, spread-out write workloads touch many different pages per checkpoint interval, each of which pays the full page cost. Post-FPW-per-page modifications within the same interval are cheap (small deltas), but the aggregate is dominated by the large post-checkpoint spike.

Quantified on HammerDB TPROC-C (TPC-C-derived OLTP benchmark):

Configuration WAL/transaction
With FPW on compute 58 KB
With FPW disabled + image-generation pushdown <4 KB

94% reduction in compute-emitted WAL volume. (Source: sources/2026-05-07-databricks-how-lakebase-architecture-delivers-5x-faster-postgres-writes)

Structural properties

  • Interval-scoped. Scope of each FPW is the checkpoint-interval; every checkpoint resets the "first modification since checkpoint" state.
  • Asymmetric first-vs-subsequent. The first modification of a page pays 8 KB; subsequent modifications pay only delta size. Workloads that repeatedly touch the same hot pages are disproportionately less affected by FPW than workloads that spread writes across many cold pages.
  • Doubles as a read-path reset point. FPW's stated purpose is torn-page recovery on the write path, but it has a side-effect load-bearing role on the read path: the periodic full page images act as reset points in the delta chain. Without them, read-time page reconstruction would have to replay an unbounded chain of deltas.
  • Configurable but not trivially so. Postgres has full_page_writes = on|off at the cluster level; turning it off in a classical Postgres deployment is generally unsafe because the torn-page risk is real.

The compute-storage-separation case for disabling

The Databricks Lakebase 2026-05-07 post canonicalises the architectural insight that FPW is structurally unnecessary when compute is stateless and streams WAL to a Paxos-based quorum of safekeepers:

In the lakebase architecture, your compute is stateless. It does not rely on a local data directory. Instead, it streams WAL to a Paxos-based quorum of safekeepers. Because there is no local-disk page to tear, the failure mode FPW was designed to prevent simply does not exist.

This is the first canonical wiki instance of concepts/compute-storage-separation enabling the structural elimination (not just relocation) of a durability primitive that existed to handle a local-disk failure mode.

The catch: naive disable breaks reads

Turning FPW off without further architectural work creates an unbounded-delta-chain problem on reads. The read-path reset-point role of FPW was implicit, not stated in the Postgres docs, and disappears when FPW is disabled.

The patterns/image-generation-pushdown-to-storage pattern solves this by moving image generation from the compute's WAL stream to the storage layer's background work, preserving the read-path bounded-replay property while eliminating the write-path WAL inflation. Lakebase rolled this out across its entire global fleet in ~6 weeks (late March 2026 → 2026-05-07) via the existing Postgres XLOG_FPW_CHANGE WAL record mechanism with no customer restarts.

Contrast with adjacent durability primitives

  • vs checkpoint. Checkpoint is the interval boundary; FPW is the per-page cost incurred at the start of each interval. Checkpoints are about bounding crash-recovery replay length; FPW is about tolerating partial disk writes within recovery.
  • vs write-ahead logging (generic WAL). WAL itself logs changes; FPW logs entire pages. They complement rather than overlap.
  • vs replication. Replication is about cross-node durability (many copies); FPW is about single-node durability under partial-write failure. Compute-storage-separated architectures like Lakebase handle cross-node durability at the safekeeper quorum layer (Paxos-based), which is why the single-node FPW primitive becomes redundant.

Seen in

  • sources/2026-05-07-databricks-how-lakebase-architecture-delivers-5x-faster-postgres-writes — canonical first-class wiki page on FPW; the 2026-05-07 post quantifies the 15× WAL inflation ceiling and canonicalises the compute-storage-separation-makes-FPW-unnecessary framing. Lakebase disabled compute-side FPW across its global fleet with image-generation pushdown as the read-path remediation, achieving 94% compute WAL-volume reduction (58 KB/txn → <4 KB) and 5× write throughput at 32 vCPU on HammerDB TPROC-C.
Last updated · 542 distilled / 1,571 read