SYSTEM Cited by 2 sources

VReplication¶

What it is¶

VReplication (vitess.io docs) is Vitess's native data-replication framework — the substrate Vitess uses for all of its data-motion workflows: database imports (migrate an external MySQL into Vitess), resharding (split or merge shards), table moves (relocate tables between keyspaces), materialised views, and online schema changes. Every one of these user-facing operations is implemented on top of the same VReplication primitive: a per-source-to-per-target stream that consists of an initial row-copy phase (read a consistent non-locking snapshot of the table ordered by primary key) followed by a continuous replication phase (tail MySQL binlog events from the GTID position captured at the end of the copy phase).

Architectural shape¶

Each VReplication workflow consists of N streams, one per (source, target-shard) pair. Each stream has its own independent state and advances independently.

Copy phase (per table, serial across tables within a workflow):

Source tablet issues LOCK TABLES <tbl> READ on the source MySQL (read-only for milliseconds).
Opens START TRANSACTION WITH CONSISTENT SNAPSHOT.
Reads @@global.GTID_EXECUTED to get the GTID set corresponding to this snapshot.
Releases the LOCK TABLES.
Reads rows from the snapshot ordered by PRIMARY KEY (or a best-PK-equivalent non-null unique key) so the source MySQL reads directly off the clustered index without a filesort.
Each row is filtered (per the workflow's sharding scheme) and routed to the appropriate target shard's PRIMARY tablet, where it is inserted.
Per-stream progress is persisted in the target's sidecar copy_state table so the copy can resume from the exact last-copied key on restart.

Copy phases are interruptible and resumable — the vreplication_copy_phase_duration flag bounds how long a single copy cycle runs before the stream pauses, catches up on binlog events for the rows copied so far, and re-enters the copy phase. This regular catch-up loop prevents the binlog retention horizon from lapping the stream during long copies. (Source: .)

Replication phase (after all rows in the workflow are copied):

Per-stream COM_BINLOG_DUMP_GTID command to the source MySQL, providing the stream's recorded GTID position.
Source MySQL streams binlog events from that position.
Events are filtered (per the sharding scheme) and applied on the target PRIMARY tablet.
Advancing GTID persisted in the sidecar vreplication table on every event-batch commit so the stream can always restart from exactly-where-it-left-off.

Use cases¶

Via the MoveTables and Reshard workflow wrappers:

External-MySQL → Vitess imports (PlanetScale Database Imports is built on this shape): import an unmanaged external MySQL into a Vitess cluster. "An unsharded MySQL database … split into N shards as part of the data migration into PlanetScale." (Source: .)
Resharding — split or merge existing Vitess shards.
Table move between keyspaces — e.g. promote a materialised view to its own keyspace.
Online schema change — Vitess's native DDL workflow uses VReplication to shadow-migrate rows into a schema-evolved copy and then cutover.

Why it shows up on this wiki¶

VReplication is the load-bearing data-motion primitive of Vitess, and therefore of PlanetScale MySQL. Every zero-downtime migration PlanetScale performs — including the petabyte-scale customer migrations documented by Matt Lord — runs on VReplication. The 2026-02-16 post is the canonical public walkthrough of the mechanism: the exact MySQL-primitive choices (LOCK TABLES READ → START TRANSACTION WITH CONSISTENT SNAPSHOT → @@global.GTID_EXECUTED → release → PRIMARY KEY-ordered stream → COM_BINLOG_DUMP_GTID), the copy / catch-up interleaving, the per-target-shard fan-out, and the sidecar persistence for restart semantics.

Fault tolerance¶

Every state-bearing decision point in a VReplication stream persists state in sidecar tables so restart is the recovery path for any failure. "Anything can fail throughout this process and the system will be able to recover and continue where it left off." (Source: .) See concepts/fault-tolerant-long-running-workflow for why this is a correctness requirement, not a nice-to-have, at petabyte scale.

Seen in¶

sources/2026-04-21-planetscale-dealing-with-large-tables — Ben Dicken (PlanetScale, 2024-07-10) exposes VReplication's copy + catch-up architecture at pedagogical altitude via the MoveTables and Reshard workflows: "with large tables, these two steps will take a while (hours). While this is happening, all production traffic will still be routed to [the source]." Canonical wiki datum that the copy phase tolerates hours-long duration because ongoing writes to the source keyspace continue to replicate into the targets via binlog-tail — the same substrate that describes at petabyte scale is the pedagogical MoveTables/Reshard substrate here.
— canonical wiki description of VReplication's copy-plus- replication architecture, the exact MySQL primitives used, per-stream GTID tracking, sidecar persistence for fault tolerance, and the copy/catch-up interleaving that prevents binlog retention from lapping long copies. VReplication is named as the substrate under PlanetScale's Database Imports feature + the MoveTables workflow.
— Vitess 21 ships two VReplication enhancements: (1) Reference-table materialization as a first-class Materialize primitive — prior workflow required hand-authored workflows; v21 adds explicit Materialize-command support for replicating small read-mostly lookup tables from an unsharded keyspace into every shard, enabling shard-local joins. New canonical patterns/reference-table-materialization-via-vreplication pattern. (2) Dynamic workflow configuration — runtime knobs previously bound to VTTablet command-line flags (requiring process restart to change) are moved onto the workflow control plane: "We now allow these to be overridden while creating a workflow or updated dynamically once the workflow is in progress." Canonical wiki instance of moving config off process-flags into the workflow control plane. Also canonicalises VReplication's deeper integration with the reintroduced atomic distributed transactions, including MoveTables and Reshard operations.
sources/2026-04-21-planetscale-behind-the-scenes-how-schema-reverts-work — canonical wiki description of VReplication as the substrate for online schema changes + instant schema reverts. Guevara + Noach walk the full online-DDL lifecycle: (1) schema-change execution uses a concepts/shadow-table-based shadow- table online schema change, with VReplication handling the backfill + catch-up + cut-over phases exactly as in the 2026-02-16 data-motion post — just at table scope inside a single keyspace rather than across keyspaces. (2) Five VReplication design properties named as uniqueness factors: copy-and-changelog progress both tracked (not just backfill); per-transaction GTID mapping; GTID-set-driven interleaving between copy and change-log phases; transactional coupling of sidecar state with destination write; and crucially "Unlike any other schema change solution, Vitess does not terminate upon migration completion." (3) Schema revert as the load-bearing payoff: after cut-over, VReplication is re-primed in the inverse direction (new → old) so the old-schema table stays current with every post-cut-over write — canonical concepts/pre-staged-inverse-replication concept and patterns/instant-schema-revert-via-inverse-replication pattern. Revert is then a second freeze-point swap of two already-in-sync tables, no data copy, invoked by the "Revert changes" button in the PlanetScale deploy- requests UI. The architectural insight: VReplication's non-termination after cut-over turns a one-way door into a revolving door for the schema-change case, mirroring the already-canonical patterns/reverse-replication-for-rollback at the data-motion scale.
— canonical wiki framing of VReplication as the internal consumer of the VStream primitive that is simultaneously exposed as a public CDC API. Matt Lord's framing: "VStream is a low-level component, provided via gRPC, that is used internally by VReplication to replicate data within Vitess for various workflow types such as MoveTables and Reshard." The two-altitude layering (tablet-level VStream for internal VReplication workflows + VTGate-level VStream for external CDC drivers) means VReplication's data-motion correctness dogfoods the same primitive that external consumers ride on. Canonical wiki datum that the CDC driver ecosystem pattern's stability-of-API property is enforced by the vendor's own internal workflows depending on it.

Seen in — FK-aware Database Imports snapshot-then-tail sequencing¶

— canonical wiki disclosure of VReplication's foreign-key-aware Database Imports fallback. The normal VReplication flow alternates snapshot reads with binlog tail-and-apply, so changes to rows that have been copied are replayed while changes to rows not-yet-copied can be safely skipped (they'll be captured by the next snapshot batch). This optimisation breaks when the source has FK tables with cascading actions: InnoDB applies cascades internally and never writes them to the binlog, so the binlog tail is structurally incomplete. VReplication can't trust it to be complete, and can't rely on the target's InnoDB to re-cascade (because only partial data has been copied at any given time — a DELETE on a parent row whose children haven't been copied yet can't cascade on the target). The fix: for Database Imports specifically, switch to one big snapshot-then-tail instead of alternating copy+tail — modelled after a MySQL point-in-time recovery. All replayed events then operate on rows that already exist on the target, so the target's InnoDB can cascade correctly on replay. Trade-off: Database Imports + Online DDL don't compose for FK tables on this path — the post notes "we won't be running, in PlanetScale, an Online DDL on a table that is being imported" because the import relies on target-side InnoDB to cascade, which means target-side binlog is missing cascade events, which breaks the VReplication feeding a concurrent Online DDL. A future roadmap item: route Imports through the same application-level-cascade orchestration VReplication uses for internal Online DDL flows, unifying the two paths.