Skip to content

CONCEPT Cited by 1 source

Reverse replication workflow

Definition

A reverse replication workflow is a second replication stream, flowing from the new system back to the old system, created at the moment of cutover and kept running after the switch. Its purpose is to keep the old system in sync with all writes happening on the new system so that if the new system misbehaves, traffic can be cut back to the old system without data loss and without further downtime.

Without a reverse workflow, a cutover is a one-way door: once writes are landing on the new system, the old system is strictly stale, and rolling back requires reconciling diverged state. With a reverse workflow, the cutover is a revolving door — traffic can be switched back and forth between the two systems as many times as needed.

When it matters

Most at cutover between non-identical systems:

  • Major version change (MySQL 5.7 → 8.0, Postgres 13 → 17).
  • Engine change (MySQL ↔ MariaDB, self-hosted → managed).
  • Topology change (unsharded → sharded, single-region → multi-region).
  • Storage-engine change (network-attached → direct-attached NVMe).
  • Query-planner change (query performance regressions that only show up under production workload).

Any of these can produce surprises that only manifest under real production traffic — query patterns that get slower, lock contention that didn't exist before, a query optimiser decision that breaks an index hint, subtle collation/charset differences. The reverse workflow makes these surprises recoverable rather than emergencies.

Mechanics in Vitess

At MoveTables SwitchTraffic:

  1. Before flipping routing rules, Vitess creates a new VReplication workflow in the opposite direction (target → source).
  2. Ensures there are viable PRIMARY tablets in the source keyspace to accept the reverse replication stream.
  3. Starts the reverse workflow immediately after the routing flip — as writes begin landing on the target keyspace, the reverse workflow streams them back to the source.
  4. The original (forward) workflow is marked Frozen — its state is retained but it cannot be manipulated.
  5. Customer can call MoveTables ReverseTraffic to swap back to the source keyspace. Vitess performs the same cutover sequence as SwitchTraffic but in the reverse direction.
  6. Reverse-traffic and switch-traffic can be called back-and- forth as many times as needed until the customer calls MoveTables Complete, which tears down the reverse workflow and finalises the migration.

(Source: sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale.)

Why it is a load-bearing concept

Reverse replication turns the cutover from "the risky step" into "just another operational mode we can toggle." It lowers the coordination cost of switching: the customer can cut over, observe, and revert if needed — and they can do this multiple times under real production traffic before calling MoveTables Complete. The architectural property being bought is reversibility: a property databases normally don't have across migrations, but that the workflow layer (Vitess + proxy layer) can re-establish.

Seen in

  • sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale — canonical wiki description of the reverse VReplication workflow created at MoveTables SwitchTraffic time. Matt Lord's explicit framing: "Reverse replication is put in place so that if for any reason we need to revert the cutover, we can do so without data loss or downtime (this can be done back and forth as many times as necessary)." The reverse workflow is kept running until the customer calls MoveTables Complete — framed explicitly as risk-mitigation for cross-version + sharding-topology- change + managed-vs-self-hosted kinds of migrations where surprises are most likely.
Last updated · 319 distilled / 1,201 read