Skip to content

PLANETSCALE 2026-02-16 Tier 3

Read original ↗

PlanetScale — Zero downtime migrations at petabyte scale

Summary

Matt Lord (Vitess core maintainer, PlanetScale) publishes the canonical technical walkthrough of how PlanetScale imports terabyte-to-petabyte-scale databases into its MySQL product without downtime. The process is implemented on Vitess's VReplication primitives via the MoveTables workflow, surfaced to customers as PlanetScale's Database Imports feature. The post's engineering substance is the explicit mechanism underlying every step of a zero-downtime migration — the precise MySQL primitives used (LOCK TABLES ... READ, START TRANSACTION WITH CONSISTENT SNAPSHOT, @@global.GTID_EXECUTED, COM_BINLOG_DUMP_GTID), the row-by-row clustered-index-ordered stream, the binlog catch-up loop during long copies, the VDiff verification workflow with keyspace locking + stop/restart cycle, the MoveTables SwitchTraffic cutover with query buffering and reverse-VReplication workflow creation for rollback, and the observation that every step is designed to be fault-tolerant because at petabyte scale "the likelihood of encountering some type of error — even an ephemeral network or connection related error across the fleet of processes involved in the migration — becomes increasingly likely." The post explicitly walks the customer scenario of sharding-as-part-of-migration — an unsharded external MySQL can be imported into PlanetScale and split into N shards in a single workflow.

Key takeaways

  1. A consistent non-locking snapshot is load-bearing. The per-table copy phase opens by connecting an unmanaged tablet to the external MySQL instance, issuing LOCK TABLES <tbl> READ (read-only for milliseconds) just long enough to START TRANSACTION WITH CONSISTENT SNAPSHOT and read the @@global.GTID_EXECUTED value, then releasing the lock. "At this point we have a consistent snapshot of the table data and the GTID set or 'position' metadata to go along with it so that we can replicate changes to the table that have occurred since our snapshot was taken." Canonical instance of the consistent non-locking snapshot pattern — the availability of the source database is not affected.

  2. Rows are streamed ordered by primary key to exploit the clustered index. "ordering the results by the PRIMARY KEY (PK) columns in the table … so that we can read from the clustered index immediately as we are then reading the records in order and do not need to formulate the entire result set and order it with a filesort before we can start streaming rows." If no PK exists Vitess picks the best PK equivalent (non-null unique key). The source tablet fans out N per-shard streams, each filtering rows to their destination shard per the target sharding scheme.

  3. Long-running copies interleave row-copy phases with binlog catch-up phases. Streams run until all rows are copied or the vreplication_copy_phase_duration flag fires — then the copy pauses and the stream catches up on binlog events affecting already-copied rows. Rationale: "this regular catchup step is important to ensure that we don't complete the row copy only to then be unable to replicate from where we left off because the source MySQL instance no longer has binary log events that we need (as they have been purged), in which case we would be forced to start the entire migration over again." Canonical wiki instance of snapshot-plus-catch-up-replication at scale where binlog retention becomes the operational constraint.

  4. Post-copy, continuous replication via COM_BINLOG_DUMP_GTID keeps the new system cutover-ready. Each stream sends the GTID set recorded at the end of its copy phase to the source MySQL, which streams binlog events from that position forward. Changes are filtered per-stream by the destination shard, applied on the target PRIMARY tablet, and the stream's advancing GTID is persisted in the sidecar vreplication table so that "no matter what happens, we can restart our work and pick up where we left off."

  5. VDiff is the explicit pre-cutover verification step and is itself zero-downtime. VDiff takes a named lock on the workflow in the target keyspace's topology server, stops the workflow, initiates a consistent snapshot on the source, starts each target shard's stream until it reaches the source's GTID position (equivalent to MySQL's START REPLICA UNTIL), takes consistent snapshots on each target shard, restarts the workflow, releases the lock, then full-table-scans source and each target shard concurrently comparing streamed rows. REPLICA tablets are used by default for data streaming "to prevent any impact on the live production system." Diff is fault-tolerant and incremental-resumable — run once weeks before cutover, resume closer to it. Canonical wiki instance of patterns/vdiff-verify-before-cutover.

  6. Cutover happens via MoveTables SwitchTraffic with query buffering + reverse-replication-workflow creation. Sequence: (a) pre-checks on tablet health + replication lag + workflow state; (b) lock source + target keyspaces in topology server + named lock on workflow; (c) stop writes on source keyspace, begin buffering incoming queries (see VTGate Buffering); (d) wait for replication to fully catch up; (e) create reverse VReplication workflow (target → source) — source stays in sync with new writes so customer can call MoveTables ReverseTraffic to revert the cutover without data loss; (f) initialise Vitess Sequences replacing auto_increment if tables are being sharded; (g) allow writes to target; (h) update schema routing rules so queries go to target; (i) start reverse workflow; (j) freeze original workflow; (k) release locks. "The query buffering done here is the last part that allows the entire migration to be done without any downtime." Typical cutover takes "less than 1 second." Canonical wiki instance of routing-rule-swap cutover and reverse replication for rollback.

  7. Customer traffic is routed through PlanetScale (Vitess) before cutover so the switch is invisible. Between migration start and cutover, the customer's application connects to PlanetScale instead of the source MySQL directly. Vitess applies schema routing rules that forward queries back to the old system until cutover — so at cutover, only the routing rule changes, not the application's connection topology. Incurs pre-cutover latency cost (query path = Application → PlanetScale → OldDB instead of Application → OldDB directly) which is minimal on PlanetScale Managed because the database and application are colocated. Key corollary: don't compare pre-cutover PlanetScale latency to the application's previous direct-to-MySQL latency — the network round trips dominate well-optimised queries during this phase.

  8. Sharding-as-part-of-migration. "you can have an unsharded MySQL database that we then split into N shards as part of the data migration into PlanetScale." Per the Vitess scalability philosophy databases larger than 250 GiB are recommended to shard. Rows leaving the source stream are filtered by the destination shard per the target's sharding scheme, so each target shard's PRIMARY tablet sees only rows that belong to it. Initial-load and cutover both know about the sharding scheme.

  9. Read-replica as migration source minimises load on live production. "we recommend you setup a standard MySQL replica — if you don't already have one — and use that as the source MySQL instance for the migration. This is another key factor that ensures we not only avoid downtime, but we avoid any impact whatsoever on the live production system that is currently serving your application data." Canonical wiki instance of patterns/read-replica-as-migration-source. The brief LOCK TABLES READ during snapshot setup then happens on the replica, not on the production primary.

  10. Everything is designed to be fault-tolerant because at petabyte scale failures are inevitable. "All of this work is done in a fault-tolerant way. This means that anything can fail throughout this process and the system will be able to recover and continue where it left off. This is critical for data imports at a certain scale where things can take many hours, days, or even weeks to complete and the likelihood of encountering some type of error — even an ephemeral network or connection related error across the fleet of processes involved in the migration — becomes increasingly likely." Canonical wiki statement that fault-tolerance is a load-bearing property of long-running workflows — not a nice-to-have but a correctness requirement for workflows whose duration approaches the MTBF of their component fleet. Every state-bearing decision point persists state in sidecar tables (vreplication, copy_state, VDiff tables) so restart is always the recovery path.

Systems, concepts, and patterns extracted

Systemssystems/planetscale, systems/vitess, systems/mysql, systems/vitess-vreplication (new), systems/vitess-vdiff (new), systems/vitess-movetables (new).

Conceptsconcepts/online-database-import (extended), concepts/consistent-non-locking-snapshot (new), concepts/gtid-position (new), concepts/binlog-replication (new), concepts/query-buffering-cutover (new), concepts/reverse-replication-workflow (new), concepts/schema-routing-rules (new), concepts/fault-tolerant-long-running-workflow (new), concepts/clustered-index (extended — referenced for the PK-ordered read via the clustered index to skip filesort).

Patternspatterns/snapshot-plus-catchup-replication (new), patterns/vdiff-verify-before-cutover (new), patterns/reverse-replication-for-rollback (new), patterns/routing-rule-swap-cutover (new), patterns/read-replica-as-migration-source (new).

Operational numbers

  • Petabyte-range customer migrations: mentioned as regular, not outlier. Post shows a "larger data sets — after migrating only a subset of the databases to date" dashboard screenshot. Specific petabyte figures not disclosed per migration.
  • Cutover duration: "typically take less than 1 second."
  • Migration duration: "many hours, days, or even weeks" — fault-tolerance is sized to this horizon.
  • Sharding threshold: databases >250 GiB are recommended to shard per Vitess philosophy.
  • Customer-side perceived impact at cutover: "slight spike in query latency at the point of cutover where we briefly pause the incoming queries."

Caveats

  • Mechanism-level description but no raw migration throughput numbers (GB/hour of copy, events/sec of binlog replay, VDiff rows/sec).
  • No failure-rate statistics — post says migrations of this scale can fail and the system recovers, but doesn't disclose how often retries fire or what fraction of migrations hit copy_phase_duration cycles repeatedly.
  • Source MySQL binlog retention is mentioned as a critical dependency (stream must stay current enough that binlog hasn't been purged) but no specific retention-duration guidance disclosed.
  • Schema change during migration is not addressed — the post describes DDL compatibility pre-conditions implicitly (source and target schemas must match for the tables in the workflow) but doesn't walk what happens if the source schema changes mid-migration.
  • Cross-version migrations (e.g. MySQL 5.7 → 8.0, or MySQL → MariaDB) are mentioned as supported in passing but specific version-compatibility matrix not disclosed.
  • VDiff cost — full table scans on source and all target shards concurrently; no quantified load profile on either side beyond "uses REPLICA tablets by default."
  • Pre-cutover latency penalty (application → PlanetScale → OldDB → PlanetScale → application) is acknowledged but not quantified; PlanetScale's mitigation is colocation-in-same-cloud-account-and-AZ via PlanetScale Managed.
  • Reverse workflow lifetime — how long customers typically remain in the "reverse available" state before calling MoveTables complete is not disclosed.
  • Vitess Sequences initialisation for sharded targets named as a cutover step but the details of how auto_increment sequences get their starting point to avoid PK collisions are not described in-post.
  • The post is a vendor explainer — written by Vitess core maintainer Matt Lord about a PlanetScale product built on Vitess. Treat the architectural claims as accurate (Lord and Vitess documentation are the primary sources for VReplication / VDiff / MoveTables mechanics); treat "PlanetScale is special" framing with appropriate vendor-post salt.

Cross-source continuity

  • Deeply extends concepts/online-database-import. Prior canonical statement on the wiki (from 2025-07-01 PlanetScale for Postgres) framed online import as a capability — customers can migrate "from any version > Postgres v13" without downtime, mechanism undisclosed. This post is the canonical mechanism description on the MySQL side: snapshot + GTID_EXECUTED + per-shard streams + binlog catch-up + VDiff + SwitchTraffic with query buffering + reverse workflow. The post thereby bookends the online-import concept across engines: PlanetScale for Postgres announces the capability at v13+ (without revealing how); this post reveals how on MySQL. Likely a significant fraction of the Postgres online- import machinery follows the same shape (consistent snapshot + LSN-from-replication-slot + logical- replication catch-up + VDiff-equivalent + proxy-layer query buffering + reverse-slot).

  • First canonical wiki detail on Vitess's migration primitives. Prior Vitess ingests covered evalengine (SQL expression evaluator), vector-index composition with SPFresh, and the first-principles-Postgres-sharding framing (Vitess rejected for Postgres → Neki). This post is the first canonical wiki deep-dive on Vitess's own inner primitives for data motion: VReplication, VDiff, and MoveTables.

  • Sibling of sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp on the Vitess-substrate axis. Martí's evalengine post is the canonical Vitess-expression-evaluation deep-dive; Lord's migration post is the canonical Vitess-data-motion deep-dive. Together they document two of the three major Vitess subsystems (the third being the query planner).

  • Complements concepts/clustered-index with a new use case — primary-key-ordered full-table streaming to avoid filesort and exploit the clustered layout for sequential I/O. Canonical instance of reading data out of the table the same way it's laid out for throughput.

  • Complements patterns/architect-sharding-from-first-principles-per-engine by giving concrete detail on how much of a sharding layer is engine-specific: not just the shard map and query router (that part is relatively portable), but the entire online-migration / online-resharding machine — which is MySQL-GTID-shaped and binlog-shaped from top to bottom in Vitess's case.

  • No existing-claim contradictions — strictly additive.

Source

Last updated · 319 distilled / 1,201 read