Skip to content

PlanetScale — The state of online schema migrations in MySQL

Summary

Shlomi Noach's 2024-07-23 PlanetScale post is a taxonomic survey of the three mechanism classes available for running non-blocking ALTER TABLE against a live production MySQL in 2024: (1) MySQL's native ALGORITHM=INPLACE, (2) MySQL's native ALGORITHM=INSTANT, and (3) third-party online-schema-change tools (pt-online-schema-change, gh-ost, spirit, and Vitess) that implement the shadow-table online schema change pattern. For each class, Noach enumerates eligibility (what DDL shapes the class supports), the cost profile (disk / CPU / I/O / replica lag), the interruptibility story, and the revertibility story — the axis on which the three classes differ most sharply. The central architectural argument: INSTANT looks perfect but buys its speed by skipping the shadow-table substrate, which is what makes online schema change revertible; if you can't accept the trade-off, stay on the 3rd-party shadow-table path even when INSTANT is eligible, because mixing the two execution models in production doubles your operational surface.

Key takeaways

  • INPLACE is technically non-blocking but has a deal-breaking replica-lag property: on replicas the operation is not non-blocking — "if the ALTER TABLE took 3 hours on the primary server, then from the moment it completes you can expect replication to stall while applying that same change for the next 3 hours or so, creating a massive 3 hour lag." The workaround (SQL_LOG_BIN=0 + manually replay per replica) works but introduces consistency risk and O(n) operator time. Canonical INPLACE page's load-bearing rejection framing. (Source: sources/2026-04-21-planetscale-the-state-of-online-schema-migrations-in-mysql)

  • INSTANT is metadata-only, runs instantly on both primary and replicas, costs zero disk and zero CPU, but covers only a narrow slice of change shapes. Verbatim: "It does not need to copy a table, does not need extra disk space, does not hammer the CPU. There's nothing to interrupt because the operation terminates before you've blinked. It also runs instantly on the replicas." Eligibility is the constraint — most INSTANT-supported changes are metadata changes that don't affect data or structure (default-value changes, enum extensions, virtual- column adds/drops). Only ADD COLUMN and DROP COLUMN (8.0.29+) affect row structure, and even then DROP COLUMN is blocked if the column participates in an index. Canonical INSTANT page extended with the 2024-era framing. (Source: sources/2026-04-21-planetscale-the-state-of-online-schema-migrations-in-mysql)

  • INSTANT introduces a new category of operational risk — DROP COLUMN is destructive and non-revertible at the metadata level, not just the data level. Verbatim: "Not only data was lost, but also metadata. What was the column type? Length? Was it nullable? That information cannot be inferred unless you have the previous schema." Reverting an ADD COLUMN is another DROP COLUMN (both INSTANT- eligible, low-friction). Reverting a DROP COLUMN requires re-authoring the column definition from version-controlled schema history — a non-trivial craft operation. Canonical new concept: concepts/schema-change-revertibility-asymmetry. (Source: sources/2026-04-21-planetscale-the-state-of-online-schema-migrations-in-mysql)

  • MySQL invisible columns are a partial mitigation for DROP COLUMN risk but are structurally limited. Verbatim: "It only affects queries that do not explicitly use the column name, such as SELECT * FROM my_table .... or INSERT INTO my_table VALUE (...). But any SELECT the_column FROM my_table query still has full access to columns." In modern codebases with explicit column lists (ORM-generated, framework-conventionalised), invisibility doesn't catch the real callers. Canonical limitation flagged on concepts/mysql-invisible-column. (Source: sources/2026-04-21-planetscale-the-state-of-online-schema-migrations-in-mysql)

  • Third-party shadow-table tools (pt-osc, gh-ost, spirit, Vitess) share a six-property operational profile that makes them the 2024 default for most production schema changes despite being slower than INSTANT: (i) mimic ALTER TABLE via shadow table + slow copy; (ii) longer wall-clock time than native ALTER TABLE; (iii) require extra disk space ~= table size; (iv) cause binlog bloat (whole table through the binlogs); (v) respect production load (pause/throttle on lag); (vi) batched and interruptible — abort leaves a cleanup artefact but no immediate cost. Canonical six-property enumeration on patterns/shadow-table-online-schema-change. (Source: sources/2026-04-21-planetscale-the-state-of-online-schema-migrations-in-mysql)

  • Both Vitess and spirit auto-detect INSTANT eligibility at submission time — the operator submits a DDL through the shadow-table tool, and the tool decides whether to invoke ALGORITHM=INSTANT (seconds) or the shadow-table path (hours). Verbatim: "Both vitess and spirit go an extra mile and can auto detect when a migration can be fulfilled using INSTANT DDL, which means you don't need to think about it or be aware of which particular version supports which changes." Canonical new pattern: patterns/auto-detect-instant-ddl-eligibility. This is the reconciliation that makes the fast-path / safe-path coexistence operationally tractable — one submission surface, engine-internal decision. (Source: sources/2026-04-21-planetscale-the-state-of-online-schema-migrations-in-mysql)

  • Vitess goes further and preserves revertibility as a first-class property across all shadow-table migrations via the inverse-replication-kept-alive-post-cutover mechanism. Verbatim: "vitess further supports revertibility as first class citizen, able to not only revert back to the original schema, but also to preserve the would-be lost data, while still accounting for any newly added, updated, or removed data since the change." Reinforces the patterns/shadow-table-online-schema-change-is-the- revertibility-substrate framing canonical on the wiki since the Guevara/Noach 2022-10 schema-revert internals post. (Source: sources/2026-04-21-planetscale-the-state-of-online-schema-migrations-in-mysql)

  • Partitioning is the one area where native MySQL sometimes outperforms third-party tools: "Some partitioning related changes should only be served by MySQL. Such is a DROP PARTITION statement for e.g. RANGE partitioned table. Some other partitioning changes are better served by MySQL, and some are best served by online schema change tools." Canonicalised as the partitioning exception to the shadow-table-always-wins default. (Source: sources/2026-04-21-planetscale-the-state-of-online-schema-migrations-in-mysql)

Extracted systems, concepts, patterns

Systems: systems/mysql, systems/innodb, systems/vitess, systems/vitess-vreplication, systems/gh-ost, systems/pt-online-schema-change (new), systems/spirit (new), systems/planetscale.

Concepts: concepts/online-ddl, concepts/instant-ddl-mysql, concepts/mysql-algorithm-inplace (new), concepts/shadow-table, concepts/schema-revert, concepts/non-revertible-schema-change, concepts/schema-change-revertibility-asymmetry (new), concepts/mysql-invisible-column, concepts/instant-deploy-eligibility, concepts/binlog-replication.

Patterns: patterns/shadow-table-online-schema-change, patterns/instant-schema-revert-via-inverse-replication, patterns/auto-detect-instant-ddl-eligibility (new), patterns/instant-deploy-opt-in.

Operational numbers

  • "If the ALTER TABLE took 3 hours on the primary server, then from the moment it completes you can expect replication to stall while applying that same change for the next 3 hours or so, creating a massive 3 hour lag" — canonical datum for the INPLACE replica- lag rejection.
  • INSTANT support arrived in MySQL 8.0, originally contributed by Tencent six years before the post (~2018), for ADD COLUMN only. Arbitrary ADD COLUMN / DROP COLUMN arrived in MySQL 8.0.29 (2022-04).
  • No fleet-level numbers; the post is taxonomic, not an incident retrospective.

Caveats

  • Taxonomic survey voice, not an architecture disclosure. Noach restates already-canonical primitives across the three mechanism classes rather than disclosing new PlanetScale internals. The novelty is in the composition — explicitly framing the 2024 decision surface — not in net-new mechanism.

  • Partitioning treatment is thin. The post names the partitioning-exception-to-the-shadow-table-default but doesn't enumerate which partitioning changes go native vs which go 3rd-party. Operators running partitioned tables will need external reference.

  • pt-online-schema-change is named but not characterised separately from gh-ost beyond their shared shadow-table substrate. The critical distinction (triggers vs binlog-tailing) is canonicalised on systems/gh-ost and systems/pt-online-schema-change, not in the post itself.

  • Cost framing is per-tool qualitative, not quantitative. No operator-facing decision matrix numbers (e.g. "use INSTANT when X, gh-ost when Y"); the reader has to synthesise.

  • spirit's architecture is hand-waved. Canonical reference framing: "recent newcomer". No mechanism deep-dive; wiki's systems/spirit page is a minimal-viable stub based on this post and the CashApp GitHub repo.

  • The post does not engage the declarative-schema- migration altitude (Atlas-CLI, Skeema, PlanetScale safe-migrations) — that's a different axis (migration-authoring-surface) orthogonal to the execution-algorithm axis this post enumerates. See patterns/declarative-schema-management for the companion altitude.

  • 2024-era publication — MySQL 8.x continues to ship INSTANT coverage extensions (e.g. 8.4 column re-ordering); the eligibility matrix is a moving target.

  • No fleet-level incident narrative. The "destructive DROP COLUMN with no barriers" warning is a predictive risk framing, not a post-mortem. See concepts/non-revertible-schema-change for the structural framing.

Cross-source continuity

  • Companion to Noach 2022-05 operational-relational-schema-paradigm essay — that post is the axiom-layer charter; this 2024 post is the mechanism-survey that instantiates tenets 1 (non-blocking), 2 (resource- aware), 5 (interruptible), 7 (failover-agnostic), and 8 (reversible) across the three mechanism classes.

  • Companion to Noach 2024-09-04 Instant Deploy Requests — same author, same INSTANT substrate, two months later. The 2024-09 post canonicalises PlanetScale's product-layer fast-path/ safe-path opt-in on top of the three-mechanism survey canonicalised here. This post frames the trade-offs; the 2024-09 post canonicalises PlanetScale's decision (default Online DDL, operator opts into INSTANT).

  • Companion to Guevara + Noach 2022-10 schema reverts internals — that post canonicalises the [[patterns/instant- schema-revert-via-inverse-replication|inverse- replication revert mechanism]]; this post frames it as a first-class Vitess-only property that distinguishes Vitess from pt-osc / gh-ost / spirit.

  • Companion to Burns 2021-05 non-blocking schema changes — Burns's 2021-era launch-post framing (Online DDL as the execution engine inside a managed workflow) is restated here with 2024-era extensions (spirit as newcomer, INSTANT auto-detection as reconciliation).

  • Complements Coutermarsh 2024-04 how-PlanetScale-makes-schema- changes — Coutermarsh's post is the application-tier workflow altitude; this post is the execution-algorithm altitude beneath it.

Source

Last updated · 470 distilled / 1,213 read