Skip to content

PATTERN Cited by 2 sources

Reshard online via VReplication

Pattern

Resharding an already-sharded keyspace is a repeatable, online operation via the Vitess Reshard workflow: new shards are provisioned empty, VReplication copies + catches up the source shards' data into the new shards, then a SwitchTraffic cutover atomically swaps routing. Production traffic continues on the source shards throughout the copy phase; cutover is fast (seconds) even when the copy phase is slow (hours).

The key property: resharding is not a one-way door. A keyspace can go from 1 shard → 4 → 8 → 16 — or downsize to 8 → 4 — with the same primitive. Each resharding is independent of the previous one; the workflow doesn't accumulate irreversible state.

When to apply

Mechanics

Canonical sequence from sources/2026-04-21-planetscale-dealing-with-large-tables:

  1. Provision new shards — for a 1 → 4 shard split, spin up four new MySQL primaries within the same keyspace, named by Vitess's keyspace-id-range convention -40, 40-80, 80-c0, c0-.
  2. Configure VSchema — declare the sharded table's primary Vindex (e.g. hash on user_id) + a Vitess sequence table for ID generation. For initial sharding, also ALTER TABLE ... change log_id log_id bigint not null to remove MySQL's auto_increment (Vitess now owns ID generation).
  3. Create the reshard workflow:
    vtctldclient Reshard --workflow shardMuscleMakerLog --target-keyspace musclemaker_log \
      create --source-shards '0' --target-shards '-40,40-80,80-c0,c0-'
    
    VReplication starts copying rows from the source shard(s) to the target shards, routing each row to the shard owning its hash(shard_key) keyspace-id range. Then it tails the source shards' binlog to keep targets in sync with ongoing writes.
  4. Switch traffic:
    vtctldclient Reshard --workflow shardMuscleMakerLog --target-keyspace musclemaker_log switchtraffic
    
    VTGate's routing rules atomically swap from reading source shards to reading target shards. The cutover is the routing-rule-swap cutover pattern.
  5. Drop the source shards — once traffic is confirmed healthy on the targets and rollback is no longer wanted, the source shard(s) can be decommissioned.

The copy phase is the same snapshot-plus-catchup-replication shape as cross-keyspace MoveTables, but stays inside one keyspace.

Why it works

  • VReplication is idempotent + resumable. A crash during copy restarts from the last GTID checkpoint; no manual fixup required.
  • The cutover is separable from the copy. Copy can take hours; cutover is seconds. Production traffic routes to source throughout copy.
  • Reverse replication is pre-staged. On SwitchTraffic, Vitess starts a reverse VReplication stream from the new targets back to the old sources — so rollback via ReverseTraffic is immediate and zero-downtime if the cutover turns out to be a mistake.
  • Shard-range naming is compositional. A 4-shard keyspace can split to 8 by halving each range (-40-20 + 20-40); no global renumbering. The keyspace-id address space is stable; shard ranges are just how it's partitioned.

Comparison to cross-keyspace MoveTables

Property Reshard (this pattern) MoveTables (cross-keyspace)
Target Same keyspace, new shards Different keyspace
Use case Horizontal scale-out/in Vertical sharding, renames, cluster merges
VSchema change Shard count, ranges Table-to-keyspace mapping
ID generation Vitess sequences carry across shards May need new sequence per keyspace
Cutover primitive switchtraffic (routing rules by shard range) switchtraffic (routing rules by keyspace)

Same underlying VReplication; different VSchema scope.

Caveats

  • Shard-key choice is not re-examined by Reshard. If your existing shard key is wrong for the workload, resharding to more shards of the same key doesn't fix the underlying cross-shard-query problem. Re-axis requires MoveTables to a new keyspace with a different primary Vindex.
  • Long copies can lap the binlog retention window. VReplication's copy + catchup interleaving prevents this in normal operation, but exceptionally long copies + short binlog retention can cause catch-up to fail. Production deployments extend retention during reshard windows.
  • Cost of the transient state. During the copy phase, both source and target shards exist — doubling storage and connection capacity until the source is decommissioned. Budget for this during planning.
  • Cross-shard integrity invariants. If the source database had cross-shard uniqueness constraints implemented at the application layer, resharding may expose them to race conditions during the cutover window. Audit before cutover.

Seen in

  • sources/2026-04-21-planetscale-dealing-with-large-tables — Ben Dicken's canonical pedagogical treatment. Names Reshard as "the canonical primitive for the horizontal-sharding rung" with the full create --source-shards '0' --target-shards '-40,40-80,80-c0,c0-'switchtraffic sequence. Frames resharding's repeatability as the structural differentiator from one-way vertical-sharding moves: "Vitess also has support for resharding an already-sharded table ... we can expand out to using more and more shards. We can also downsize or use less shards if demand decreases."
  • sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale — the VReplication + VDiff + ReverseTraffic substrate disclosed at the petabyte-scale altitude. Describes the copy + binlog-tail + GTID-checkpointed architecture that makes Reshard fault-tolerant. The Reshard workflow is one of the several Vitess workflows (alongside MoveTables, Materialize, LookupVindex, Migrate) that all sit on the VReplication substrate.
Last updated · 550 distilled / 1,221 read