CONCEPT Cited by 1 source

Point-in-Time Recovery (PITR)¶

Definition¶

Point-in-Time Recovery (PITR) is the database capability of producing a fresh, queryable copy of the database at a chosen past timestamp, typically for:

Incident undo — "we dropped the wrong table 4 minutes ago — give us a version from 5 minutes ago."
Accidental-delete recovery — restore specific rows wiped by a bad query, pipeline bug, or user error.
Forensic / audit — inspect historical state without running a transaction-replay.
Dev / QA / staging — fork a past production state as a sandbox without the replay cost of log-shipped standby.

The classical implementation — periodic full snapshots + WAL / binlog replay between the snapshot and the target time — is operationally expensive: restore a 2 TB snapshot to a new EBS volume (minutes to hours), then replay WAL forward to the target (more minutes). For a live incident this makes PITR the option of last resort rather than the first thing tried.

PITR on compute-storage-separated substrates¶

On a compute-storage- separated substrate where the storage layer already keeps historical page versions + the WAL as durable shared state (for example Pageserver + Safekeeper on Neon / Lakebase), PITR collapses into a copy-on-write fork targeted at a past timestamp. No snapshot restore, no physical copy, no minutes-long wait. The operation becomes:

Ask the storage layer to expose a logical view of the pages as they were at time t.
Point a new compute instance at that logical view.
Connect and query.

Each of these is sub-second. PITR's wall-clock cost becomes dominated by the control-plane round-trip + compute boot, not by the data volume.

Canonical disclosure (Lakebase, 2026-04-30)¶

"Branching and Point-in-Time Recovery (PITR) are essentially the same primitive: branching is just PITR with source_branch_time = now." — Thoughtworks Backstage POC.

Measured: 3.78 seconds end-to-end from a wipe of the final_entities table (32 rows → 0) to a recovery branch with all 32 entities restored, while production itself was still at zero (branches are fully isolated). See sources/2026-04-30-databricks-backstage-with-lakebase.

This is an order of magnitude faster than the traditional snapshot-restore-plus-WAL-replay PITR shape, and reframes PITR from "last-resort disaster-recovery operation" to "routine undo-button you hit when a command goes wrong."

Target-time granularity (WAL-bounded)¶

PITR's target-time is not caller-specified precision — it is bounded by the WAL-record cadence of the underlying store. Ask for 22:56:02Z; get 22:55:50Z (12 seconds earlier) if the nearest durable WAL record is at 22:55:50Z. This is the concepts/wal-record-granularity property — PITR always snaps backward to the nearest known durable state.

For time-sensitive workflows (e.g. "recover to just before the bad commit at T") this means the caller must request T − ε for sufficient margin, rather than T exactly. The Lakebase POC disclosed a 12-second snap-back as representative; different workload intensities + WAL-write cadences will produce different granularities.

Contrast: branching vs PITR¶

On a copy-on-write-capable substrate, the two are the same operation with a different source_branch_time:

	Branch	PITR
Source time	`now`	past timestamp
Data content	current production state	state at past time
Mechanism	COW-fork at current head	COW-fork at historical head
Use case	pre-deploy test, dev sandbox, policy testing, agent operation	accidental-delete recovery, incident undo
Latency	1.09 s (63 MB Backstage catalog)	3.78 s (recovery + verify)

See patterns/branching-is-pitr-with-time-now for the architectural unification.

Prior art on classical substrates¶

AWS RDS / Aurora — PITR supported via automated backups + transaction log retention. Recovery is restore-to-new- instance (fresh RDS instance provisioned, historical state materialised, minutes to hours depending on data volume). See systems/amazon-aurora.
MySQL / Postgres DIY — pg_basebackup + WAL archive; operator runs pg_createsubscriber or pg_wal_replay on a fresh cluster.
Traditional snapshot systems — storage-tier snapshot (EBS / ZFS / LVM) plus log-replay on top.

All of these are available; none of them is fast enough to be the first thing you try during a live incident. The compute-storage-separated substrate's advantage isn't that PITR becomes possible — it becomes cheap enough to be routine.

Seen in¶

sources/2026-04-30-databricks-backstage-with-lakebase — canonical first wiki instance of PITR at Lakebase / Neon altitude. Thoughtworks POC demonstrates 3.78-second end-to-end recovery from a 32-row-deletion incident, with 12-second WAL-snap-back granularity disclosed. Canonicalises the branching ≡ PITR-with-time-now architectural unification + the "every incident gets an undo" framing that follows from PITR at sub-10-second latencies.

concepts/database-branching — PITR's structural twin (same primitive, different target time).
concepts/copy-on-write-storage-fork — the mechanism PITR uses on compute-storage-separated substrates.
concepts/wal-record-granularity — the property that bounds PITR target-time precision.
concepts/binlog-replication — classical alternative for cross-instance point-in-time replay.
concepts/compute-storage-separation — the architectural precondition for PITR-as-fork.
systems/lakebase — canonical instance.
systems/pageserver-safekeeper — substrate that makes historical-page PITR possible.
systems/amazon-aurora — prior-art PITR at RDS altitude.
patterns/branching-is-pitr-with-time-now — architectural unification pattern.