Skip to content

PLANETSCALE 2022-09-06 Tier 3

Read original ↗

PlanetScale — Gated Deployments: addressing the complexity of schema deployments at scale

Summary

PlanetScale post by Shlomi Noach (Vitess maintainer, creator of gh-ost, PlanetScale) published 2022-09-06, re-fetched 2026-04-21. This is the canonical product- launch post for PlanetScale's "Gated Deployments" feature — the earlier of two Noach posts on the deployment- unit model (the successor Deploying multiple schema changes at once from 2023-08- 29 extends the framing to multi-change dependency analysis via schemadiff). Where the 2023 post canonicalises "near-atomic" as the application-facing property and walks through the equivalence-class / topological-ordering machinery, the 2022 launch post canonicalises three orthogonal dimensions the successor assumes as background: (1) "multi-dimensional deployments" as the axis that spans both multiple changes in one deployment and a single change over a multi-sharded database — i.e. the multi-shard schema sync problem that the gated-deployment mechanism solves by "tracking the progress of a schema deployment across all shards and holding off the final switch to the new schema until all shards are ready." (2) The user- controllable cutover time — Gated Deployments introduces an "Auto-apply" checkbox on the deploy-request UI that lets the operator defer the final cutover "at their discretion," canonicalising the operator-scheduled cutover pattern as a first-class product primitive. (3) The canonical weekend-at-2am motivation"why 30 minutes?" is the most-common customer question about the revert window, and Gated Deployments answers it by moving the clock start from automatic completion to operator click-through. Shlomi Noach is the same wiki-canonical Vitess-maintainer voice as the already-ingested throttler trilogy, the instant-schema-reverts retrospective, and the 2023 multi-change deployment post. The 2022 launch is the earliest canonical disclosure of the "Gated Deployments" product name on the wiki.

Key takeaways

  1. Multi-dimensional deployments have two axes: multiple changes, and multiple shards. Canonical verbatim framing:

"The problem begins with multi-dimensional deployments. With these, you will either have multiple schema changes in the same deployment, or have a single change deployed over a multi-sharded database, or both."

(Source: sources/2026-04-21-planetscale-gated-deployments-addressing-the-complexity-of-schema-deployments-at-scale.)

The 2023 successor post canonicalises the multiple- changes axis exhaustively; this 2022 post is the canonical disclosure of the multi-shard axis. Different shards run under different workloads, so a migration that takes 2 hours on one shard may take 4 on another — "a multi-sharded database where different shards have different schemas can be either inconsistent in performance, or outright inconsistent in design." Gated Deployments minimises the per-shard drift window. Canonicalised as multi-shard schema sync.

  1. Parallel copy is bounded; parallel tail is unbounded. Canonical verbatim disclosure of the concurrency rule inside the staged-then-sealed state:

"Consider two ALTER TABLE changes over two large tables. The bulk work is copying over the existing table data, which is done sequentially per table. But tailing the changelog and applying the ongoing changes can be done in parallel. We run as much of the bulk work as possible upfront, sequentially, and then run the more lightweight work in parallel."

This names the asymmetry that makes near-atomic deployment feasible under resource constraints — the heavy work (copy) is serialised to prevent database hog-out; the light work (binlog tail) is parallelised because it's the cheap ongoing maintenance of staged-then- sealed state. Extends the 2023 post's framing with a named scheduling rule: "as much of the bulk work as possible upfront, sequentially."

  1. Near-atomic as the application-facing property (first-party framing, 2022 edition). Canonical verbatim:

"With Gated Deployments, PlanetScale applies all changes as closely as possible, seconds apart from each other... Once all changes are in good shape, we complete the migrations as closely as possible. While not strictly atomically, the deployment can be considered more atomic; up till the final stage, no change is reflected in production. In fact, the deployment may be canceled at any point up until its completion time."

This is the earliest canonical disclosure of the near-atomic framing on the wiki — the 2023 successor post refines the terminology ("we use the term 'near-atomically'") but the 2022 launch post is where it first appears. Composes with cancel-before-cutover in the same paragraph.

  1. Multi-shard final switch held until every shard is ready. Canonical verbatim:

"Our gated deployments minimize that gap period, by tracking the progress of a schema deployment across all shards and holding off the final switch to the new schema until all shards are ready. The switch then takes place almost simultaneously (though not atomically) on all shards."

This extends the gated deployment gating primitive from the per- deploy-unit scope (many changes → single gate event) to the per-shard scope (many shards × one change each → single gate event across the fleet). The deploy controller's readiness-aggregation operates over changes × shards, not just changes. Canonicalised as the multi-shard dimension of concepts/multi-shard-schema-sync.

  1. The 30-minute revert window's timer is the most-asked product question. Canonical verbatim:

"The most common questions around our schema revert feature revolves about that time limit: 'why 30 minutes?', 'What happens if the deployment completes at 2:00am over the weekend, and I can't access my laptop in time?', 'Can we have better control over the timings?'"

The revert window is capped (disclosed in the 2023 schema-reverts post as pre-staged inverse replication — 30 minutes is the resource-cost / revertability trade-off point) and this is a known source of customer friction. The canonical weekend-at-2am-motivation for Gated Deployments' user-controllable cutover is this specific operational pain point: "the clock starts ticking at the wrong moment for the operator."

  1. The "Auto-apply" checkbox moves the clock start from system-driven to operator-driven. Canonical verbatim:

"By default, deployments auto-complete when ready, and this is great for most cases, and clears up the deployment queue. However, if the user so chooses, they may uncheck the 'Auto-apply' box. The deployment now stages all changes and runs all long-running tasks. When all changes are ready, the deployment awaits the user to hit the 'Apply changes' button. With no input from the user, the deployment will just keep on running in the background, always keeping up to date with data changes."

Canonicalised as operator- scheduled cutover — a first-class product primitive that composes staged-then-sealed (the mechanism) with explicit operator-click-to-seal (the UI gesture). The operator chooses the moment of the application- visible event, which is also the moment the 30-minute revert window begins. "Come Monday morning, when the developer is at their desk and fully prepared to begin their work week, they may click the 'Apply changes' button."

  1. Indefinite staging is structurally free given the machinery exists. Canonical verbatim:

"With no input from the user, the deployment will just keep on running in the background, always keeping up to date with data changes."

Extends staged-then-sealed's framing with the explicit operational property that the staged state has no natural time-out — the binlog-tail loop is the mechanism, and the mechanism runs the same whether sealing happens in 8 hours or 3 days. The 2023 successor post's caveat ( "Resources are not infinite... altering a hundred tables in one deployment request is not feasible") bounds this from above; the 2022 launch post bounds it from below by disclosing that day-long holds are explicitly supported.

  1. Gated Deployments as a step toward decoupled schema- app development flow. Canonical verbatim:

"This release of Gated Deployments brings us another step closer to our goal of a more modern and cohesive development flow, where schema changes happen alongside application development, not in isolation."

The architectural motivation is decoupling schema-change from application-code-change, so schema becomes a first-class artifact in the development-cycle rather than a separate DBA-mediated workflow. Gated Deployments moves schema-change out of the "rigid irreversible commit that starts when you ALTER" model into a "cancellable, stage-able, operator-scheduled" model — a product-UX reification of the near-atomic property.

Systems / concepts / patterns extracted

Operational numbers

  • 30 minutes — the post-completion revert window. Canonicalised as the point where pre-staged inverse replication streams are torn down. The 2022 post frames this as fixed; customer friction with the fixedness is the motivation for Gated Deployments itself.
  • "seconds" — the application-visible cutover window under gated deployment vs "minutes to possibly hours" for per-shard drift under traditional per-shard migration.
  • "seconds apart" — the inter-change cutover window when multiple changes in one deployment unit are sealed together.
  • "hours" / "day" — canonical worked-example migration durations. "A deployment with three ALTER statements over large tables may take a day to run."
  • No production telemetry disclosed — no gate-wait distributions, no per-shard drift histograms, no auto- apply usage rates, no 30-minute-window revert-rate, no Apply-changes-button latency data.

Caveats

  • Product-launch voice, not deep-dive architecture. Shlomi Noach introduces the Gated Deployments product with three dimensions (multi-change, multi-shard, operator-scheduled) but the mechanism-level detail (schemadiff equivalence classes, reverse-order revert validity proof) is deferred to the 2023 successor post. This post is the canonical "what it does"; the 2023 post is the canonical "how it works."
  • Multi-shard readiness-aggregation protocol not disclosed. The post states "tracking the progress of a schema deployment across all shards" but does not name the aggregation protocol. In practice this is per- shard VReplication-stream ready-flag publication to a deploy-controller that awaits all-green — but the post leaves the mechanism implicit. The separate concepts/gated-schema-deployment canonical page derives this from the 2023 post; this 2022 post is where the multi-shard scope first appears.
  • "Almost simultaneously (though not atomically) on all shards" — the cross-shard cutover is not atomic in the formal distributed-transaction sense. Shards apply the cutover sequentially over a short window; between the first and last shard's cutover, the cluster has a "hybrid" schema state visible across shards. The window is seconds, not hours, but it is nonzero — and the post does not quantify it. A client routed to shard A post-cutover + shard B pre-cutover during this window may observe schema skew; the 2022 post does not disclose the operational discipline around this.
  • Auto-apply's default-on is a product-UX choice, not an architectural one. "By default, deployments auto- complete when ready" — the choice of opt-in-to-operator- scheduling (vs opt-out) is a product decision reflecting that most customers prefer the clear-the-queue behaviour. The post does not disclose adoption rates or customer-feedback rationale.
  • Apply-changes button semantics around the gate-post revert window are implicit. Once the operator clicks Apply changes, the 30-minute revert window begins — meaning the operator also chooses the moment the revert clock starts. "The deployment then completes, and the 30 minute window for schema reverts starts ticking, all while the developer is in control of the situation." This composes user-controlled cutover with user-controlled revert-clock start but the post does not walk through the operational implications (e.g. operator chooses a Monday-morning click-through to maximise overlap with the work week).
  • Inter-shard dependency semantics not addressed. What if different shards run different schemas for a duration that exceeds the client's read-after-write invariants? The post elides cross-shard query consistency during the seconds-long cross-shard cutover window — this is a real concern for clients that do cross-shard queries during a deployment.
  • Monolith-view of the multi-sharded database is asserted, not proven. "By design, a multi-sharded database acts as though it were a monolith." Whether the multi-shard gated-deployment actually preserves this monolith illusion during the cutover depends on client behaviour during the seconds-long switch window — the post asserts the property holds but does not walk through the mechanism guaranteeing it.
  • Ingest-overlap with the 2023 successor post. Most of the gated deployment, near-atomic, staged-then- sealed, and cancel-before-cutover framing on the existing wiki was canonicalised from the 2023 post. This 2022 post adds the multi-shard dimension and the operator-scheduled-cutover dimension; it does not contradict the 2023 post's framing. The two posts are canonical companions from the same author; this is the earlier product-launch, the 2023 post is the mechanism- deep-dive.

Source

Last updated · 347 distilled / 1,201 read