SYSTEM Cited by 1 source
VDiff¶
What it is¶
VDiff (vitess.io docs) is Vitess's zero-downtime consistency checker for data-motion workflows. Given an active VReplication workflow, VDiff verifies that every row that should have been copied or replicated actually landed on the destination shards, correctly filtered per the sharding scheme and with values matching the source. It is the explicit pre-cutover confidence gate on any VReplication-based migration, resharding, or table move.
Why it exists¶
Zero-downtime migrations span hours, days, or even weeks at petabyte scale and involve many streams across many tablets on fleet hardware. Every step of VReplication is fault-tolerant, but even so: "at least one [VDiff] is run before the cutover to ensure that the data has been copied correctly and that the new system is in sync with the old." (Source: sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale.) See patterns/vdiff-verify-before-cutover for why verify- before-cutover is the canonical pre-switch discipline for any long-running replication pipeline.
How it works¶
Per-table diff (serial across tables within a workflow):
- Lock the workflow — acquire a named lock on the workflow in the target keyspace's topology server so the workflow can't be concurrently manipulated while VDiff initialises.
- Stop the workflow for table-diff initialisation.
- Consistent snapshot on source — exactly as in
VReplication's copy
phase:
START TRANSACTION WITH CONSISTENT SNAPSHOTon the source tablet, record the resultingGTIDposition. See concepts/consistent-non-locking-snapshot. - Per-target-shard
START REPLICA UNTIL-equivalent — each target shard's stream is started until it has reached the source's captured GTID position, then stops. On each target shard, open a consistent snapshot. At this point source + all target shards hold consistent snapshots of the table at exactly the same logical time. - Restart the workflow — replication resumes applying new source events to the target shards while the diff scans from the frozen snapshots on each side.
- Release the workflow lock.
- Concurrent full-table scan on source and each target shard, comparing streamed results and noting any discrepancies (missing rows, mismatched values). Diff state persisted in the VDiff sidecar tables on each target shard.
The diff reports to the user on completion (see
VDiff show):
ETA, rows compared, any discrepancies with detail, and
status per table.
Key properties¶
- Zero downtime on the source production system.
"The
VDiffwill chooseREPLICAtablets by default on the source and target, for the data streaming (the work is still orchestrated by and the state still stored on the targetPRIMARYtablets), to prevent any impact on the live production system." (Source: sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale.) - Fault-tolerant and resumable. "It will automatically pick up where it left off if any error is encountered." Important at petabyte scale — see concepts/fault-tolerant-long-running-workflow.
- Incremental / resumable over long cutover-preparation
horizons. "If e.g. you are in the pre-cutover state
for many weeks or even months, you can run an initial
VDiff, and then resume that one as you get closer to the cutover point." - Disturbs the workflow minimally. The stop / snapshot / restart dance takes only as long as the snapshot-setup itself; the full diff scan runs concurrently with normal VReplication catch-up.
Why it shows up on this wiki¶
VDiff is the canonical wiki instance of a consistency
verifier designed for zero-downtime data-motion
workflows. The shape is reusable: lock the workflow →
snapshot both sides at matching logical times → resume
workflow → concurrently scan both sides → report. Any
long-running replication pipeline (Debezium + target store,
Postgres logical replication, AWS DMS, vendor-specific CDC
pipelines) needs the same verification primitive before
any cutover that trusts the destination for primary
traffic. VDiff documents the architectural choice points:
run on REPLICAs to avoid source-side load, persist state
on target PRIMARY tablets, make it resumable from arbitrary
failure points, and make incremental resumption the
operational default so the verification cost amortises
across the cutover-preparation horizon.
Seen in¶
- sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale — canonical wiki description of VDiff's workflow-locking
- source-snapshot + target-
START REPLICA UNTIL+ restart - release-lock + concurrent-scan mechanism, plus the
REPLICA-tablet preference for zero source-side impact and the resumable / incremental operation mode. VDiff is named as the explicit pre-cutover verification step on every PlanetScale migration.
Related¶
- systems/vitess
- systems/vitess-vreplication
- systems/vitess-movetables
- systems/mysql
- systems/planetscale
- concepts/consistent-non-locking-snapshot
- concepts/gtid-position
- concepts/fault-tolerant-long-running-workflow
- patterns/vdiff-verify-before-cutover
- patterns/snapshot-plus-catchup-replication
- companies/planetscale