PATTERN Cited by 2 sources
Snapshot plus catch-up replication¶
Problem¶
Copying a live database's data to another system without taking the source offline has a fundamental tension:
- A consistent copy requires a fixed point-in-time view of the source — normally achieved by blocking writes.
- Not taking the source offline requires not blocking writes — which means data can change during the copy.
The naive solutions don't work:
- Serialise everything on the source for hours — the source is effectively offline.
- Copy rows one-by-one and hope — the destination contains rows from N different logical times; any cross-row invariant (foreign keys, derived indexes, ACID observations taken during the copy) is broken.
- Retry from scratch on any write — does not converge.
Solution¶
Capture a consistent non-locking snapshot plus the replication position that corresponds to it, copy the snapshot's rows to the destination, and tail the source's change log from the captured position to catch up with any writes that happened during (and since) the copy.
The pattern has three phases:
1. Snapshot + position capture¶
Atomically open a snapshot and record the replication position that corresponds to it. The source remains fully available for writes during this step (or is briefly read-locked for milliseconds depending on the engine).
- MySQL —
LOCK TABLES <tbl> READ+START TRANSACTION WITH CONSISTENT SNAPSHOT+SELECT @@global.GTID_EXECUTED+UNLOCK TABLES. Captured state: consistent snapshot + GTID set. - PostgreSQL —
BEGIN+SET TRANSACTION ISOLATION LEVEL REPEATABLE READ+pg_export_snapshot()+pg_current_wal_lsn()or replication slot's LSN. Captured state: snapshot token - WAL LSN.
- MongoDB — causal-consistency session + oplog timestamp.
2. Row copy¶
Stream rows from the snapshot to the destination. Common optimisations:
- Order rows by primary key so the source reads off the
clustered index without a
filesort. - Parallelise across tables (one stream per table or per destination shard).
- Checkpoint per-key progress in durable storage so the copy is resumable from any interruption.
- Interleave row-copy cycles with change-log catch-up
cycles (Vitess
vreplication_copy_phase_duration) to stay inside the source's binlog / WAL retention horizon — a hard constraint for long copies.
3. Change-log catch-up¶
Once the row copy completes, tail the source's change log from the captured position to bring the destination current. This phase runs indefinitely until cutover.
- MySQL —
COM_BINLOG_DUMP_GTIDwith the captured GTID set; stream binlog events from that position forward. - PostgreSQL — logical replication slot consumer; stream WAL-derived change-set from the captured LSN.
- MongoDB — tail the oplog from the captured timestamp.
Canonical wiki instance¶
Vitess VReplication — used
under the hood by
MoveTables for all
PlanetScale customer migrations. Each per-target-shard
stream:
- Takes a consistent non-locking snapshot of each table in sequence, records the source GTID.
- Streams rows out of the snapshot ordered by primary key,
filtered to the destination shard per the sharding
scheme, persisting per-key progress in the
copy_statesidecar table. - Interleaves copy with
COM_BINLOG_DUMP_GTIDcatch-up cycles to stay inside binlog retention. - On copy completion, transitions to continuous replication from the captured GTID, persisting the advancing GTID on every commit.
(Source: sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale.)
The post explicitly warns against skipping the catch-up interleave: "this regular catchup step is important to ensure that we don't complete the row copy only to then be unable to replicate from where we left off because the source MySQL instance no longer has binary log events that we need, as they have been purged, in which case we would be forced to start the entire migration over again."
Composes with¶
- patterns/read-replica-as-migration-source — run the snapshot + catch-up against a replica, not the production primary, to avoid any source-side load at all.
- patterns/vdiff-verify-before-cutover — verify the destination's row-for-row equality with the source before flipping any traffic.
- patterns/routing-rule-swap-cutover — cut over to the destination via an atomic routing-rule flip at a proxy layer.
- patterns/reverse-replication-for-rollback — at cutover, create the reverse snapshot-plus-catch-up stream (target → source) so rollback is available without data loss.
Seen in¶
-
sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale — canonical wiki instance of the pattern at petabyte scale, on MySQL, via Vitess VReplication. The full choreography is documented —
LOCK TABLES READbriefly, consistent-snapshot, GTID capture, unlock, PK-ordered copy, per-stream destination shard filter, copy/catch-up interleave, post-copy continuous-replication viaCOM_BINLOG_DUMP_GTIDwith persisted GTID progress. Matt Lord's framing is explicit that this is the load-bearing pattern for any zero-downtime data motion at scale. -
sources/2026-04-21-planetscale-behind-the-scenes-how-schema-reverts-work — the within-table, online-DDL application of the same pattern. Guevara + Noach walk the forward direction (consistent-snapshot batches + GTID capture + binlog catch-up applied to a shadow table being schema-evolved) at table scope. The post makes explicit that the snapshot-plus-catch-up shape applies regardless of whether the "copy" is moving data between keyspaces (the 2026-02-16 case) or between tables in the same keyspace (the online-DDL case). Extends the pattern family with the PlanetScale-specific observation that the snapshot-plus-catch-up stream does not have to terminate at cut-over — patterns/instant-schema-revert-via-inverse-replication re-uses the stream, now running in reverse, to keep a second table (the inverse shadow) continuously in sync, so a schema revert is a second swap of two tables that are already caught up.
Related¶
- concepts/consistent-non-locking-snapshot
- concepts/gtid-position
- concepts/binlog-replication
- concepts/online-database-import
- concepts/fault-tolerant-long-running-workflow
- concepts/shadow-table
- concepts/cutover-freeze-point
- concepts/pre-staged-inverse-replication
- concepts/online-ddl
- systems/mysql
- systems/vitess-vreplication
- systems/vitess-vdiff
- patterns/read-replica-as-migration-source
- patterns/vdiff-verify-before-cutover
- patterns/routing-rule-swap-cutover
- patterns/reverse-replication-for-rollback
- patterns/shadow-table-online-schema-change
- patterns/instant-schema-revert-via-inverse-replication
- companies/planetscale