Skip to content

PATTERN Cited by 3 sources

Shadow-table online schema change

Problem

MySQL's native ALTER TABLE is often too blocking to run against a production table. Some schema changes qualify for ALGORITHM=INSTANT (a metadata-only flip, e.g. adding a column at the end) or ALGORITHM=INPLACE (an in-place rebuild the engine can sometimes do without copying every row). But the common shapes teams run in production — dropping a column, changing a column's type, rebuilding a primary key, changing charset on a large text column — require a full table rewrite. Running that rewrite against the live table either:

  • Blocks writes for the duration (worst case; tens of minutes to hours on a large table), or
  • Causes heavy IOPS contention + replication lag even if not strictly blocking.

The problem is to apply an arbitrary DDL to a live table without the applications on top of it seeing either downtime or sustained performance degradation.

Solution

Build a second table with the target schema, backfill it from the live table under a consistent snapshot, track concurrent writes through the change log, and atomically swap the two tables at a brief cut-over. The four-step shape, verbatim from the post:

"In short, online schema change tools copy the production table without data, apply the schema change to the copy, sync the data, and swap the tables."

The four steps in detail:

  1. Build shadow. CREATE TABLE <name>_shadow LIKE <name>, then apply the DDL to the shadow: ALTER TABLE <name>_shadow <ddl>. Shadow is empty at this point, so applying the DDL is cheap regardless of how expensive it would be on the filled production table.
  2. Backfill rows. Copy rows from production to shadow, ordered by primary key, in batches inside consistent non-locking snapshots so the reads don't block writes. Record the GTID position at which each batch was taken so the tool knows which binlog events to replay.
  3. Track concurrent writes. Tail the binlog for events that landed after the recorded GTID and touch rows already in the shadow; apply them to the shadow through the DDL's column-projection. Interleave catch-up and copy so the shadow converges on the production table — the classic snapshot + catch-up shape at table scope.
  4. Cut over. At a cut- over freeze point, briefly write-lock the production table, apply any residual binlog events to the shadow, mark the freeze-point GTID, RENAME TABLE production TO production_old, production_shadow TO production, release the lock. Downstream proxies like VTGate buffer queries during the lock so clients never see errors.

Instantiations

Multiple tools implement this same pattern with different degrees of operational rigour:

  • pt-online-schema-change (Percona Toolkit) — the original. Triggers on the production table fire synchronously on every write to mirror it into the shadow. Simple but imposes a write-path latency cost and requires trigger DDL privileges.
  • gh-ost (GitHub, originally by Shlomi Noach — author of the PlanetScale post) — triggerless. Tails the binlog from a replica instead of using triggers. Zero write-path cost at the source; the tool can throttle itself based on replication lag / load signals.
  • Vitess VReplication — online DDL as a first-class workflow. Backfill + catch- up phases are interleaved by GTID-set comparison; sidecar state is persisted transactionally so restart is the recovery path for any failure. Uniquely among the three, Vitess does not terminate the stream after cut-over, which enables instant schema revert via inverse replication.

Trade-offs

  • Double storage during the migration. The shadow is a full copy of the production table + the new schema. On a TB-scale table this is a non-trivial resource commitment.
  • Binlog retention becomes operationally load-bearing. If the shadow can't catch up inside the retention window, the migration has to restart from scratch. Forces a tuning relationship between copy-phase duration, interleaving cadence, and binlog retention.
  • DDL semantics must be projectable by the tool, not just by MySQL. For gh-ost and VReplication, the tool has to encode the column mapping (what forward-projection turns an old-schema row into a new-schema row); complex DDL (charset changes, generated columns, reordered primary keys) can surface correctness bugs the tool has to be aware of.
  • Cut-over is still write-locked briefly. The whole migration is non-blocking except for the freeze point — a brief table-level write lock at swap time. At the proxy layer this is invisible (VTGate buffers queries); at the MySQL layer it is a real lock.

Composes with

Seen in

  • — Holly Guevara (PlanetScale, 2022-08-29). Canonical framework-workflow anchoring of the pattern: the five-step shadow-table lifecycle verbatim ("Create a copy of the table (known as a shadow table). Apply the schema changes. Get the data in sync between both tables. Swap the tables atomically. Drop the old table.") — same mechanism as canonicalised from the Guevara/Noach 2022-10 internals post, translated into a Laravel- deploy-workflow prescription. Load-bearing for the php artisan migrate-as-direct-DDL anti-pattern framing: Laravel's Forge-default quick-deploy sequence (git pullcomposer installphp artisan migrate) is canonicalised as the anti-pattern shadow- table online-DDL displaces. The post also canonicalises PlanetScale's safe-migrations feature as a server- side direct-DDL blocker that enforces the pattern: "PlanetScale does not support direct DDL on production branches, unless you disable safe migrations."
  • sources/2026-04-21-planetscale-the-state-of-online-schema-migrations-in-mysql — Shlomi Noach (PlanetScale, 2024-07-23). Canonical wiki disclosure of the six-property operational profile shared by all four third-party instantiations (pt-osc, gh-ost, spirit, Vitess) verbatim: (i) "mimic an ALTER TABLE by creating a shadow table with the new schema and slowly copying over data"; (ii) "can and often will take longer time to complete as compared with a native MySQL ALTER TABLE"; (iii) "require extra disk space, about as much as the existing table"; (iv) "cause binary log bloating (essentially the entire table content goes through the binary logs)"; (v) "respect production workload, and will pause or throttle as needed so as to give way to production traffic"; (vi) "operate in small batches of changes, hence are able to keep replication lag to a minimum (and throttle based on lag)" + "interruptible: the operation can be aborted at no immediate cost." Load-bearing architectural contrast canonicalised here: mixing INSTANT and shadow-table flows in the same production pipeline doubles operational surface, so the shadow-table path remains the default — "if you already have to use one of the 3rd party solutions, you may as well use it all the time." The reconciliation that makes this tractable is canonicalised as patterns/auto-detect-instant-ddl-eligibility — Vitess and spirit short-circuit to INSTANT internally when eligible, without exposing the decision to the operator. Positions shadow-table as the revertibility substrate for the Vitess-specific [[patterns/instant-schema-revert-via-inverse- replication|inverse-replication revert]] extension — a property missing from pt-osc / gh-ost / spirit but architecturally reachable from the common four-phase substrate.

  • — Lucy Burns, PlanetScale, 2021-05-20. Canonical prior-art-taxonomy altitude on the wiki. Burns 2021 describes the shadow-table mechanism verbatim as the common foundation of both pt-online-schema-change and gh-ost:

    "creating a new table that is a copy of the given table. The schema changes are applied to the new table and the data in the original table is copied over. Once that is complete, the original table is replaced by the new table." This is the canonical 2021-era disclosure of the pattern as a solved-problem class that spans multiple implementations. Burns's architectural contribution is positioning these mechanisms as the execution engine inside a larger branch-based workflow — the shadow-table pattern is necessary but not sufficient for a managed schema-change product.

  • sources/2026-04-21-planetscale-instant-deploy-requestsfast-path alternative altitude. Shlomi Noach (PlanetScale, 2024-09-04) canonicalises that for the subset of schema changes MySQL can absorb via ALGORITHM=INSTANT, the shadow-table pattern is skipped entirely in favour of a direct metadata-only data-dictionary update. Canonical trade-off framing: the shadow-table pattern's guarantees (sub-second app-visible cut-over, 30-minute inverse-replication revert window) are emergent properties of the mechanism, not free- standing features. Taking the fast path trades them for raw speed (hours → seconds) plus a possible multi-second metadata lock. See concepts/instant-deploy-eligibility for which changes qualify, concepts/non-revertible-schema-change for the revert consequence, and patterns/instant-deploy-opt-in for the operator- visible opt-in shape. Canonical wiki framing: this pattern and the INSTANT-DDL fast path are complementary, not competing — Online DDL remains the default at PlanetScale because its properties are predictable across all change shapes.

  • sources/2026-04-21-planetscale-behind-the-scenes-how-schema-reverts-work — canonical wiki formulation. Guevara + Noach introduce the pattern as the common denominator of all online-DDL tools before diving into the VReplication-specific extension that enables schema reverts. The four-step-shape framing ("copy the production table without data, apply the schema change to the copy, sync the data, and swap the tables") is directly quoted. The post also names the five VReplication design properties that distinguish its instantiation of the pattern from pt-online-schema-change / gh-ost: copy-and-changelog-both-tracked, GTID-precise transaction mapping, GTID-driven interleaving, transactional sidecar-state coupling, and non-termination after cut-over.

  • — Shlomi Noach, PlanetScale, 2022-05-09. Canonical axiom-layer framing — the shadow-table mechanism is the structural answer that satisfies tenets 1 (non-blocking), 2 (resource-aware), 5 (interruptible), and 7 (failover-agnostic) of the operational schema paradigm simultaneously. The pattern's copy-rewrite-on-the-side-then-swap shape is what makes each of those tenets achievable: the production table is never held under a blocking lock (tenet 1); the copy + changelog workers can yield IO/CPU to the app (tenet 2); cancelling before the cut-over swap means dropping the shadow table with zero production effect (tenet 5); the shadow table and its changelog position are durable, any replica can resume the work (tenet 7). Noach's 2022-05 essay frames the principle; the mechanism predates it (gh-ost 2016, pt-online-schema-change earlier).

Last updated · 542 distilled / 1,571 read