Skip to content

PATTERN Cited by 2 sources

Routing rule swap cutover

Problem

After a long, careful, verified zero-downtime migration, the final step is still risky in most architectures: switching the application's queries from the old database to the new one. The naive approaches all have visible failure modes:

  • Change the application's connection string. Requires coordinated rollout across every application pod, is not atomic across tables, produces a reconnection storm, is slow to revert.
  • DNS flip. Atomicity is best-effort (DNS caches outlive flip), revert is at DNS-TTL resolution.
  • Stop old DB, start new DB with same address. Briefly unavailable, not reversible.

The underlying issue: most architectures have no query-level routing primitive that can be flipped atomically, at sub-second granularity, without the application participating.

Solution

Put a query-aware proxy layer between the application and the database that terminates the database wire protocol, applies per-table routing rules to decide which backend to send each query to, and can be told to update those rules atomically. The cutover is then:

  1. Stop writes on the source keyspace.
  2. Buffer incoming queries at the proxy (concepts/query-buffering-cutover).
  3. Wait for replication to fully catch up.
  4. Atomically update the routing rules so queries for the migrated tables go to the new keyspace instead of the source keyspace.
  5. Release buffered queries against the new backend.
  6. (Optionally) start a reverse workflow so rollback is possible without data loss (patterns/reverse-replication-for-rollback).

The application sees a brief latency spike on the in-flight queries. It does not see errors, dropped connections, or reconnection storms. It doesn't even know the cutover happened.

Required substrate

  • Query-aware proxy — MySQL-wire-compatible (VTGate), Postgres-wire-compatible (Pgpool, PgBouncer, PlanetScale proxy), or protocol-agnostic (application gateway). Must be able to speak the database wire protocol end-to-end, not just TCP-forward it.
  • Per-table (or per-keyspace, or per-workflow) routing rules that can be updated atomically and that the proxy respects per-query, not per-connection.
  • Query buffering — the proxy must be able to pause inbound queries for the brief duration of the flip and release them afterwards.
  • Topology-server-level transactionality — the routing rule update itself must be atomic across proxy nodes. In Vitess this is implemented via a distributed topology server with workflow locks.

Canonical wiki instance

Vitess MoveTables SwitchTraffic — the canonical wiki implementation. Full sequence (Source: sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale):

  1. Pre-checks on tablet health + replication lag + workflow state.
  2. Lock source + target keyspaces in topology server.
  3. Lock workflow (named lock).
  4. Stop writes on source keyspace.
  5. Begin buffering incoming queries at VTGate.
  6. Wait for forward replication to fully catch up.
  7. Create reverse VReplication workflow for rollback.
  8. Initialise Vitess Sequences if tables are being sharded.
  9. Allow writes to target keyspace.
  10. Atomically update schema routing rules pointing migrated tables at the target keyspace.
  11. Release buffered queries to target.
  12. Start reverse VReplication workflow.
  13. Freeze original (forward) workflow.
  14. Release locks.

Typical cutover duration: "less than 1 second." The application sees a brief latency spike; no errors.

Composes with

Seen in

  • sources/2026-04-21-planetscale-bring-your-data-to-planetscale — canonical wiki instance of the sister pattern at PlanetScale. Phani Raju's 2021 launch post describes an earlier variant of PlanetScale Imports where cutover is not an atomic sub-second routing-rule swap but instead an explicit direction reversal ("Enable primary mode") preceded by an extended bidirectional-routing validation phase (see patterns/database-as-data-router). Both patterns are built on the same primitives (routing rules, VReplication, reverse replication, unmanaged tablets), but differ in the cutover ceremony: this page's pattern makes cutover an instant with pre-cutover validation by VDiff / replicas; the sister pattern makes cutover an operator-driven direction flip with pre-cutover validation by real application traffic running against the destination-as-proxy. The 2021 post canonicalises the bidirectional-validation alternative; the 2026 post canonicalises the atomic-swap default.

  • sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale — canonical wiki instance. Matt Lord documents the exact MoveTables SwitchTraffic sequence Vitess runs, names VTGate's routing-rule update as the explicit moment of cutover, and frames the whole sequence as sub-second. The sequence is presented as the standard approach for all PlanetScale customer migrations. Key architectural property: the application's connection string never changes — only the routing rule at the proxy layer does.

Last updated · 319 distilled / 1,201 read