Skip to content

PATTERN Cited by 1 source

Branching is PITR with time=now

Pattern

On a compute-storage-separated substrate with copy-on-write storage, two operations that are usually built, billed, and operated as separate features collapse into one primitive with a different time parameter:

  • Branching"give me a new isolated database identical to the current one" → internally a copy-on-write fork of the storage head at now.
  • PITR"give me a new isolated database identical to the one from time t" → internally a copy-on-write fork of the storage head at t.

Same control-plane call, same storage substrate, same compute- attach step. Only the source_branch_time parameter differs.

Canonical statement (Lakebase, 2026-04-30):

"Branching and Point-in-Time Recovery (PITR) are essentially the same primitive: branching is just PITR with source_branch_time = now."

(Source: sources/2026-04-30-databricks-backstage-with-lakebase)

Why the unification matters

Classical database platforms ship branching and PITR as different features with different implementations:

  • Branching (where offered at all) is typically a dump/restore or a logical-clone workflow at schema altitude — seconds to minutes, often schema-only.
  • PITR is typically a snapshot + WAL/binlog replay workflow — minutes to hours for meaningful datasets.

Different code paths, different latency envelopes, different operator experience, different documentation. Most platforms treat PITR as a last-resort disaster-recovery tool and branching (when present) as a developer-workflow tool; nothing links them architecturally.

On a copy-on-write storage substrate, they are the same operation. The implementation is:

fork(source_branch_time=T):
  1. control-plane: validate caller, authorize, allocate branch ID
  2. storage: expose logical view of pages as of time T
  3. compute: stand up new Postgres VM pointing at that view
  4. return new branch's connection string

Change T from now to T - 5 minutes and you have PITR. Change T from now to T - 3 days and you have long-tail recovery. Change T from now to now and you have branching. Same code, same latency envelope, same mental model.

Operational consequences

Once branching and PITR are the same primitive, three useful secondary properties follow:

Every risky operation gets a dry run

Before doing something irreversible in production (migration, data-corrupting bug fix, policy change, destructive query), fork production at now, run the operation on the fork, inspect the outcome, and only then run it in production. The "what if I did this?" query becomes as cheap as the operation itself.

Every incident gets an undo

Dropped a table, deleted rows, corrupted data with a bad UPDATE? PITR is no longer a multi-hour exercise — it's a sub-10- second branch creation with source_branch_time = <moment before incident>, followed by a verification query + a replay of the good state into production. The Thoughtworks POC demonstrated 3.78 seconds end-to-end for a 32-row delete recovery.

Branching & PITR share observability, quotas, audit

Because they share the control-plane call, they share the operator surface. One set of quotas governs both (active branches, storage divergence budgets, TTLs). One set of audit logs captures both. One set of metrics measures both (branch creation latency, branch count, divergent-page accrual). Platforms that keep them separate duplicate all this.

Disclosed latency envelopes (Lakebase, 2026-04-30)

Operation Time parameter Measured wall-clock
Branching (63 MB Backstage catalog) now 1.09 s data plane
PITR (32-row recovery) now − seconds 3.78 s end-to-end

The latencies are the same order of magnitude — as expected from the unification. The PITR operation includes the verify recovered data step; pure branch-creation at historical time would be closer to the 1.09-second number.

Substrate requirements

The unification is only possible on substrates with:

  1. Compute-storage separation — storage lives independently of compute; compute is ephemeral.
  2. Copy-on-write pages at storage altitude — the storage layer can expose logical views that share physical pages with the source until one side writes.
  3. Historical page retention — the storage layer keeps enough page versions / WAL records to materialise historical states, bounded only by the retention window.
  4. Per-branch compute attach — a new compute VM can be stood up pointing at a specific logical-view-of- storage in sub-second latency.

Neon-lineage Lakebase (via systems/pageserver-safekeeper) meets all four. Aurora meets (1) and partially (2) but branching ≡ PITR isn't explicitly surfaced as a unified primitive. Classical Postgres + WAL archive meets (3) only — no separation, no COW, no per-branch compute attach.

Generalises to

  • Workflow undo buttons — once branching ≡ PITR, UIs can expose a single "revert to T" primitive regardless of how far back T is.
  • Pre-deploy validation — CI can fork at now, run schema migration, verify success; CD can promote the branch's schema into the main branch atomically.
  • Cross-tier consistency — at the storage layer, any time-addressable fork is potentially useful (testing, forensics, audit, rollback).

Seen in

  • sources/2026-04-30-databricks-backstage-with-lakebase — canonical statement. Thoughtworks Backstage POC makes the unification explicit: "Branching and Point-in-Time Recovery (PITR) are essentially the same primitive: branching is just PITR with source_branch_time = now." Demonstrates both operations at Lakebase altitude with the 1.09-second + 3.78-second numbers + the "every risky operation gets a dry run, every incident gets an undo" framing.
Last updated · 439 distilled / 1,268 read