Skip to content

PATTERN Cited by 1 source

Live WAL protocol switch via XLOG_FPW_CHANGE

Intent

Roll out a breaking change to the WAL protocol contract between compute and storage on a live fleet without customer restarts by piggybacking on an existing Postgres control record (XLOG_FPW_CHANGE) that both sides already understand.

The control record is a pre-existing Postgres concept — it was designed to let a running database change its FPW mode without restart — and the compute + storage tiers can use its appearance in the WAL stream as an in-log feature flag: once storage sees the record for a given compute, it knows to handle that compute's subsequent WAL stream under the new contract.

Canonical instance: Lakebase / Neon, late March → 2026-05-07

From the 2026-05-07 Databricks post, verbatim:

The change was applied to running computes via our control plane and storage system, which coordinated the transition automatically. This was achieved using the existing Postgres XLOG_FPW_CHANGE WAL record mechanism, meaning no restarts or interruptions were required for our customers.

(Source: sources/2026-05-07-databricks-how-lakebase-architecture-delivers-5x-faster-postgres-writes)

Rollout arc:

  • Late March 2026: first customers switched to the new protocol (compute disables Full Page Write, storage-side image generation takes over).
  • ~6-week rollout window: control plane + storage system coordinate per-compute switches across the global fleet.
  • 2026-05-07: "active for all Lakebase Serverless and Neon databases globally".
  • Zero customer restarts — the XLOG_FPW_CHANGE WAL record signals the change atomically within the running compute's own log stream.

Why a pre-existing control record is the right vehicle

Classical Postgres already uses XLOG_FPW_CHANGE to mark changes in the full_page_writes setting within a live cluster — so both the compute-side emitter code and the storage-side consumer (pageserver) already know how to parse and interpret it.

This means Lakebase could roll out the new protocol without:

  • Adding a new WAL record type (which would require backward-compat parsing on storage and a flag day where storage had to be upgraded before any compute could emit the new record).
  • Adding an out-of-band coordination channel (which would require a distributed flag state that compute and storage both observe consistently).
  • Restarting any compute (which breaks the "serverless, can scale to zero" product contract).
  • Coordinating a fleet-wide flag day (which carries operational risk on a multi-thousand-cluster global fleet).

Instead: the existing control record carries the flag in-line with the data it controls, making the switch atomic and visible to exactly the party that needs to know (the storage tier handling that compute's WAL).

Structure

Per-compute switch sequence:

1. Control plane decides compute C is ready for the switch
   (criteria: healthy state, recent backup, sane workload
   profile, fleet-wide rollout quota not exceeded).

2. Control plane sends an internal signal to compute C
   changing its full_page_writes configuration.

3. Compute C emits an XLOG_FPW_CHANGE record into its WAL
   stream at the current LSN.

4. The record flows through safekeeper to pageserver.

5. Pageserver sees the XLOG_FPW_CHANGE record and, from
   that LSN onward, handles compute C's WAL stream under
   the new protocol:
   - no more FPW records (so no reset-point-from-compute)
   - start generating images locally per the
     image-generation-pushdown threshold

6. Reverse direction is identical: control plane can send
   another signal, compute emits another XLOG_FPW_CHANGE
   going back to FPW-on, pageserver stops generating
   images from that LSN.

When it fits

  • Breaking protocol changes between two tiers that both read the same log (write-ahead log / redo log / event log).
  • A pre-existing control record that both tiers already parse, even if it was designed for a different (but related) purpose.
  • Per-peer granularity — the record appears in one compute's log and affects only that compute's storage interaction; other computes are unaffected until they independently switch.
  • Idempotent or forward-only switches — applying the record once vs twice should have the same effect; the switch shouldn't corrupt state if the record is replayed during recovery.
  • Live fleets where downtime is expensive — the architectural effort of a live-switch mechanism pays off at fleet scale.

When it doesn't fit

  • No pre-existing control record is available — you'd have to introduce one, which is essentially the flag-day scenario this pattern avoids.
  • Changes that break the log format — new records or new delta-field semantics would prevent the old reader from parsing the log at all. XLOG_FPW_CHANGE works because the field semantics of records after it are unchanged — only the presence or absence of FPW records changes.
  • Cross-tier changes beyond the log — if the storage tier's behaviour change affects APIs visible to other systems (not just the compute↔storage WAL protocol), an in-log flag doesn't reach those systems.
  • Changes that require atomic fleet-wide switch — a per-compute sequential rollout won't satisfy a "all computes switch at the same moment" requirement.
  • Regulatory / forensic constraints — if WAL must contain a self-describing account of state at every point, adding a protocol-flag record changes what the log means.

Failure modes

  • Rollout stalls with fleet in a split state. Some computes on new protocol, some on old. Mitigation: the pattern is split-state-tolerant by design (pageserver handles both), but operational complexity increases; limit rollout-pause duration.
  • Storage tier forgets the switch state after a restart. Pageserver needs durable memory of each compute's current mode — it can recompute this by scanning for the latest XLOG_FPW_CHANGE record in the stream, but this needs to be part of pageserver recovery logic.
  • Customer workload regresses after switch. Not every workload benefits from image-generation pushdown; the control plane needs observability to detect regression and auto-switch back. Databricks does not disclose the observability criteria.
  • Ordering ambiguity on concurrent writes during switch. Records in flight when the switch is emitted need to be handled correctly under both old and new protocol. The in-log nature of XLOG_FPW_CHANGE makes this unambiguous (records before the XLOG_FPW_CHANGE LSN are old-protocol; after are new-protocol), but implementation on the pageserver side still has to be correct.
  • Control-plane-storage-system divergence. If control plane thinks compute C is on new protocol but storage system is still reading old-protocol records for C, reconciliation required. The in-log flag is the source of truth; control plane merely initiates.

Generalisation beyond Postgres

The pattern is an instance of a broader idea:

Feature flags in the log. When two distributed systems communicate via a stream of records (not RPC), the in-stream flag record is the cleanest way to switch protocol behaviour atomically and per-peer, because the flag and the data it controls travel on the same ordered channel.

Sibling instances on other substrates:

  • Kafka + consumer protocol upgrades: producer emits a magic-byte-version-change record; consumers from that offset onward switch decoding.
  • Event-sourcing + schema evolution: aggregate emits a SchemaChangedTo(v2) event; downstream projectors switch from v1 to v2 decoding at that offset.
  • MySQL binlog protocol switches: new binlog event types can be introduced gated by a control event that older replicas recognize.
  • Raft log membership changes: AddMember / RemoveMember records in the Raft log are how membership transitions are made without requiring external coordination outside the log itself.

Relationship to adjacent patterns

Seen in

  • sources/2026-05-07-databricks-how-lakebase-architecture-delivers-5x-faster-postgres-writes — canonical first-class wiki pattern page. XLOG_FPW_CHANGE used as the in-log feature-flag vehicle to roll out image-generation pushdown across the global Lakebase + Neon fleet over ~6 weeks (late March 2026 → 2026-05-07) with zero customer restarts. Control plane coordinates with storage system via the in-log record, avoiding both flag-day risk and customer-facing downtime.
Last updated · 542 distilled / 1,571 read