CONCEPT Cited by 1 source
Point-in-Time Recovery (PITR)¶
Definition¶
Point-in-Time Recovery (PITR) is the database capability of producing a fresh, queryable copy of the database at a chosen past timestamp, typically for:
- Incident undo — "we dropped the wrong table 4 minutes ago — give us a version from 5 minutes ago."
- Accidental-delete recovery — restore specific rows wiped by a bad query, pipeline bug, or user error.
- Forensic / audit — inspect historical state without running a transaction-replay.
- Dev / QA / staging — fork a past production state as a sandbox without the replay cost of log-shipped standby.
The classical implementation — periodic full snapshots + WAL / binlog replay between the snapshot and the target time — is operationally expensive: restore a 2 TB snapshot to a new EBS volume (minutes to hours), then replay WAL forward to the target (more minutes). For a live incident this makes PITR the option of last resort rather than the first thing tried.
PITR on compute-storage-separated substrates¶
On a compute-storage- separated substrate where the storage layer already keeps historical page versions + the WAL as durable shared state (for example Pageserver + Safekeeper on Neon / Lakebase), PITR collapses into a copy-on-write fork targeted at a past timestamp. No snapshot restore, no physical copy, no minutes-long wait. The operation becomes:
- Ask the storage layer to expose a logical view of the pages
as they were at time
t. - Point a new compute instance at that logical view.
- Connect and query.
Each of these is sub-second. PITR's wall-clock cost becomes dominated by the control-plane round-trip + compute boot, not by the data volume.
Canonical disclosure (Lakebase, 2026-04-30)¶
"Branching and Point-in-Time Recovery (PITR) are essentially
the same primitive: branching is just PITR with
source_branch_time = now." — Thoughtworks Backstage POC.
Measured: 3.78 seconds end-to-end from a wipe of the
final_entities table (32 rows → 0) to a recovery branch with
all 32 entities restored, while production itself was still at
zero (branches are fully isolated). See
sources/2026-04-30-databricks-backstage-with-lakebase.
This is an order of magnitude faster than the traditional snapshot-restore-plus-WAL-replay PITR shape, and reframes PITR from "last-resort disaster-recovery operation" to "routine undo-button you hit when a command goes wrong."
Target-time granularity (WAL-bounded)¶
PITR's target-time is not caller-specified precision — it is bounded by the WAL-record cadence of the underlying store. Ask for 22:56:02Z; get 22:55:50Z (12 seconds earlier) if the nearest durable WAL record is at 22:55:50Z. This is the concepts/wal-record-granularity property — PITR always snaps backward to the nearest known durable state.
For time-sensitive workflows (e.g. "recover to just before
the bad commit at T") this means the caller must request
T − ε for sufficient margin, rather than T exactly. The
Lakebase POC disclosed a 12-second snap-back as representative;
different workload intensities + WAL-write cadences will
produce different granularities.
Contrast: branching vs PITR¶
On a copy-on-write-capable substrate, the two are the same
operation with a different source_branch_time:
| Branch | PITR | |
|---|---|---|
| Source time | now |
past timestamp |
| Data content | current production state | state at past time |
| Mechanism | COW-fork at current head | COW-fork at historical head |
| Use case | pre-deploy test, dev sandbox, policy testing, agent operation | accidental-delete recovery, incident undo |
| Latency | 1.09 s (63 MB Backstage catalog) | 3.78 s (recovery + verify) |
See patterns/branching-is-pitr-with-time-now for the architectural unification.
Prior art on classical substrates¶
- AWS RDS / Aurora — PITR supported via automated backups + transaction log retention. Recovery is restore-to-new- instance (fresh RDS instance provisioned, historical state materialised, minutes to hours depending on data volume). See systems/amazon-aurora.
- MySQL / Postgres DIY —
pg_basebackup+ WAL archive; operator runspg_createsubscriberorpg_wal_replayon a fresh cluster. - Traditional snapshot systems — storage-tier snapshot (EBS / ZFS / LVM) plus log-replay on top.
All of these are available; none of them is fast enough to be the first thing you try during a live incident. The compute-storage-separated substrate's advantage isn't that PITR becomes possible — it becomes cheap enough to be routine.
Seen in¶
- sources/2026-04-30-databricks-backstage-with-lakebase — canonical first wiki instance of PITR at Lakebase / Neon altitude. Thoughtworks POC demonstrates 3.78-second end-to-end recovery from a 32-row-deletion incident, with 12-second WAL-snap-back granularity disclosed. Canonicalises the branching ≡ PITR-with-time-now architectural unification + the "every incident gets an undo" framing that follows from PITR at sub-10-second latencies.
Related¶
- concepts/database-branching — PITR's structural twin (same primitive, different target time).
- concepts/copy-on-write-storage-fork — the mechanism PITR uses on compute-storage-separated substrates.
- concepts/wal-record-granularity — the property that bounds PITR target-time precision.
- concepts/binlog-replication — classical alternative for cross-instance point-in-time replay.
- concepts/compute-storage-separation — the architectural precondition for PITR-as-fork.
- systems/lakebase — canonical instance.
- systems/pageserver-safekeeper — substrate that makes historical-page PITR possible.
- systems/amazon-aurora — prior-art PITR at RDS altitude.
- patterns/branching-is-pitr-with-time-now — architectural unification pattern.