Skip to content

PLANETSCALE 2023-08-29 Tier 3

Read original ↗

PlanetScale — Deploying multiple schema changes at once

Summary

PlanetScale post by Shlomi Noach (Vitess maintainer, creator of gh-ost, PlanetScale) published 2023-08-29, re-fetched 2026-04-21. The post canonicalises PlanetScale's near-atomic multi-change schema deployment model — a Vitess-backed MySQL deployment treats all the schema changes in one branch / deploy-request as a single deployment unit that is staged together, ready together, and cut over together (a few seconds apart rather than hours apart). The mechanism composes three building blocks: (1) copy-and-swap emulation of ALTER TABLE so long-running migrations can be held in a "ready-to-complete" state indefinitely; (2) the schemadiff library's equivalence-class analysis that groups dependent diffs and computes a valid in-order execution within each class; (3) a reverse-order revert mechanism — during the 30-minute revert window, reverts are applied in the inverse of the deploy order, preserving validity by mathematical symmetry. The post is Shlomi Noach's canonical wiki voice on the multi-change deployment shape rather than single- table online DDL — it sits alongside his earlier throttler trilogy (part 1– [part 2]part 2part 3) and the Guevara + Noach instant schema reverts retrospective, filling the deploy-unit altitude that sits above single-table cut-over and below the deploy-request UX.

Key takeaways

  • Near-atomic, not atomic. Canonical verbatim framing: "With MySQL, it is not possible to transactionally and atomically make changes to multiple table schema definitions. If you want to CREATE one table, ALTER another, and DROP a third, you must run these changes in some order. For this reason, we use the term 'near-atomically.'" The deployment applies changes "almost all at once" — the engine can only serialise DDL statements, so in practice the cut-over of N long- running migrations happens over a few seconds rather than in a single atomic commit. The load-bearing architectural claim is that the application-visible "major event" is compressed from hours into seconds. Canonicalised as near-atomic schema deployment. (Source: sources/2026-04-21-planetscale-deploying-multiple-schema-changes-at-once)

  • Gated deployment inverts the traditional multi-ALTER cost model. Shlomi's worked example: three large-table changes (ALTER on three different tables) each taking "8 hours" to complete. Traditional DBAs run them sequentially — 24 hours with a rolling "partially deployed" window during which the schema is semantically inconsistent. "If we have a change of heart during the staging period or an incident that takes over priorities, the deployment may be canceled without impacting production. The friction point, where a schema may only be partially deployed, is reduced from days or hours to seconds." Gated deployment keeps each migration in a "staged but not completed" state indefinitely, and cuts all three over together near-atomically. Canonicalised as gated schema deployment and stage-all-complete-together.

  • The copy-and-swap emulation is the enabling primitive. Canonical verbatim: "This emulation mechanism is what allows us some concurrency and control over the cut-over timing. As we complete copying over a table's existing data set, we can continue to watch the ongoing changes to the table, technically indefinitely, or until we decide that it's time to cut over." The shadow-table online schema change pattern's catch-up phase becomes a stable state — long-running migrations can park in catch-up, tailing binlog forever, waiting for the deploy controller to tell them "now." This is the mechanical shift that makes gated deployment possible. Canonicalised as staged-then- sealed migration.

  • Immediate changes come last. "When we stage a deploy request, we begin by running — but not completing — all long-running changes. When we find, possibly hours later, that all long-running changes are ready to complete, we then introduce the immediate changes — like CREATE TABLE, ALTER VIEW, and similar statements. We can then apply the final cut-over for all long-running changes and the immediate changes, near-atomically, a few seconds apart." The scheduling discipline: start the expensive changes first (they define the critical path), hold them at catch-up, then apply the cheap changes just before cutting over the whole set. Canonicalised as interleaved-multi-table-migration-copy-phases.

  • schemadiff resolves dependencies into equivalence classes. Canonical verbatim: "When schemadiff compares two schemas and generates the diff statements, it also analyzes the dependencies between those statements. If any two diff statements affect entities with a dependency relationship in the schema(s), then schemadiff knows it needs to resolve the ordering of those two diffs. If yet another diff affects entities used by either of these two, then schemadiff needs to resolve the ordering of all three. All the diffs are thus divided into equivalence classes: distinct sets where nothing is shared between any two sets and where the total union of all sets is the total set of diffs." Canonicalised as schema-diff equivalence class. The algorithm discovers the partition of the diff graph into connected components (classes), then computes a valid permutation inside each class via in-memory schema migration and validation at every step — Shlomi's exact wording: "for each equivalence class, schemadiff finds a permutation of the diffs such that if executed in order, the validity of the entire schema is preserved." Canonicalised as topological order by equivalence class and schema dependency graph.

  • View ↔ table dependencies are the canonical hard case. Shlomi walks two scenarios. Adding a column + adding the column to a view — straightforward: migrate t first (the view references a column that doesn't exist yet), then apply the view change. Dropping a column + removing it from the view — the naive "do the view first" intuition is wrong at the deploy-unit altitude, because the view change is immediate but the column-drop is hours. The correct sequence is: "(1) Begin the change on t. (2) Wait until the change is ready to complete. (3) Issue the immediate change on v. (4) Follow by completing (cutting-over) the change on t." The view change becomes the trigger for the long-running cut- over. "The scenarios may be more complex when multiple, nested views are involved, which are based on yet multiple tables being changed in the deployment request." Canonical worked example for schema dependency graph.

  • Reverse-order revert is free by construction. "When all migrations are complete, PlanetScale then stages tentative reverts for all migrations. The user has a 30-minute window to undo those schema changes without losing data accumulated. If the user does choose to revert (say, some parts of the app appear to require still the old schema or if performance tanks due to wrong indexing), then those reverts are likewise applied near-atomically. Notably, the reverts are finalized in reverse ordering to the original deployment. There is no need for computation here: we rely on the fact that the original deployment was found to have a step-by-step valid ordering. Undoing those changes in reverse order mathematically maintains that validity." Canonicalised as reverse-order revert. Composes with instant schema revert via inverse replication (from the Guevara + Noach post) — the inverse-replication mechanism per migration gives the data-preservation property; the reverse-order traversal across migrations gives the cross-migration validity property. Together they compose into the 30-minute revert guarantee.

  • Cancellation is first-class for the entire staging window. "In an ideal world, we can wait out these 24 hours and call it a day (no pun intended). But in reality, we might find that our design was flawed, or perhaps there's an incident that takes priority, and we want to cancel the deployment. Has it been 10 hours? One of the changes will have been applied, the others are still pending. With traditional databases, you can't just cancel that completed schema change." The cancel-before-sealed property is load-bearing for incident response: the deployment is fully cancellable at any point up to the cut-over second. Canonicalised as cancel-before-cutover.

  • Resource bounds are real. Shlomi's closing disclaimer: "Resources are not infinite, and only so many changes can run concurrently. Altering a hundred tables in one deployment request is not feasible and possibly not the best utilization of database branching. It is possible to go too far with a branch so that the changes are logically impossible to deploy (or rather, so complex that it is not possible to determine a reliably safe path)." The near-atomic model is a finite-capacity primitive — it scales with shadow-table storage, binlog tailing cost, and schemadiff's search-space complexity. No production numbers disclosed (no max-migrations-per-deploy, no concurrency ceiling, no storage-amplification factor).

Operational numbers / concrete values

  • 8 hours — worked-example ALTER TABLE duration on a "large table."
  • 24 hours — traditional sequential-apply cost for three such changes.
  • A few seconds — near-atomic multi-change cut-over window with gated deployment.
  • 30 minutes — PlanetScale post-deployment revert window during which the inverse-replication stream is kept alive.
  • No QPS, latency, concurrent-migration-ceiling, storage- amplification-factor numbers disclosed.

Systems and concepts introduced / canonicalised

New systems (1):

  • systems/vitess-schemadiff — the Vitess subsystem / library that validates schemas, computes diffs between them, and partitions diffs into equivalence classes with valid in-order permutations inside each class. First canonical wiki disclosure of the library by name.

New concepts (7):

New patterns (4):

Extended existing pages:

Caveats

  • 2023-era post, re-fetched 2026-04-21. Architectural shape still current — gated deployment is a load-bearing PlanetScale property in 2026. The sources/2026-04-21-planetscale-announcing-vitess-21 release notes confirm schemadiff is still the active library with ongoing development in Vitess 21.
  • No code / no state machine diagrams. The post is architectural-voice pedagogy with one block diagram of equivalence-class partitioning (four-panel: given a set of diffs → group into equivalence classes → arbitrary ordering across classes → valid ordering within each class). Specific mechanisms (how the deploy controller detects "all migrations ready" — polling? signal? how resource-bounds are computed) are not disclosed.
  • schemadiff internals not disclosed at implementation-depth. The post names the library, states it runs "in-memory schema migration and validation at every step", and asserts it handles nested views — but the graph data structure, the permutation search strategy, the validity checker's semantics, and the complexity bounds are not disclosed. Links out to the Vitess schemadiff blog for details.
  • No production numbers. No max-migrations-per-deploy, no concurrent-copy-phase ceiling, no storage-amplification factor, no real-world deploy latency distribution, no revert-rate telemetry. The 8 / 24-hour numbers are hypothetical worked-example values, not production measurements.
  • Resource ceiling acknowledged but not quantified. "Altering a hundred tables in one deployment request is not feasible" — no actual ceiling stated (ten tables? 50? the logical-impossibility boundary?). Operator-facing guidance is "like code, schema changes should be made and deployed with measures in place."

Source

Last updated · 347 distilled / 1,201 read