Skip to content

PLANETSCALE 2024-04-04 Tier 3

Read original ↗

PlanetScale — How PlanetScale makes schema changes

Summary

Mike Coutermarsh (PlanetScale, originally 2024-04-04, re-fetched 2026-04-21) describes PlanetScale's internal dogfooding workflow for applying schema changes to the two PlanetScale-run databases that back their main Ruby-on-Rails API (one monolithic business-data database, one sharded database behind PlanetScale Insights) using the exact customer-facing Vitess online-schema-change tooling + a custom GitHub Actions "pull request bot" that glues the PlanetScale API to the Rails pull-request lifecycle. The post is a practitioner- voice composition rather than a mechanism deep-dive: no new primitives are introduced, but several already- canonicalised PlanetScale primitives (online DDL, deploy requests, gated deployments, shadow-table online schema change, expand-migrate- contract) are composed into a complete end-to-end Rails-application workflow: local MySQL for fast CI + planetscale_rails gem for migration execution + the PR-bot auto-creating a PlanetScale branch and deploy request + the bot's class-of-change detection emitting deploy-order instructions ("remove column: deploy code before schema", "add column: deploy code after schema") + Vitess online migration as the production-safety substrate + PlanetScale's automatic per-deploy-request queue as the multi-team concurrent-change coordinator. The load-bearing architectural claim is the separation of schema-deploy from code-deploy: "The solution to this starts with severing the tie between schema migrations and code. Allowing each to go out independently of each other… engineers are now forced to think more deeply about how their code and database schema changes interact with each other." The post is the canonical customer-voice composition of PlanetScale's already-wiki-canonicalised schema-change stack, from the inside. Mike Coutermarsh joins the PlanetScale named-voice roster as the application-tier voice (second ingest after the 2022-01-18 Rails-CI post How our Rails test suite runs in 1 minute on Buildkite).

Key takeaways

  1. Schema-deploy and code-deploy are two independent critical systems and must ship independently. Verbatim:

"The solution to this starts with severing the tie between schema migrations and code. Allowing each to go out independently of each other. There are many of benefits to doing this. The largest being: engineers are now forced to think more deeply about how their code and database schema changes interact with each other."

(Source: sources/2026-04-21-planetscale-how-planetscale-makes-schema-changes.)

This is the same load-bearing argument Taylor Barnett makes in the earlier Backward compatible database changes post canonicalising coupled vs decoupled deploy and expand-migrate- contract. Coutermarsh's post is the worked-example altitude — same structural argument applied inside PlanetScale's own Rails API.

  1. The default Rails pattern — rails db:migrate in CI before code deploy — fails at scale on two axes. Verbatim:

"The majority of Rails applications in the world update their production schema on each deployment. They will run rails db:migrate as part of their CI process immediately before deploying code to production. This process works well for many teams, but tends to suffer from growing pains as both the size of the data and engineering team grows."

Two failure modes named: (a) large-data DDL becomes dangerous and time-consuming when run directly against production; (b) team coordination — larger teams deploying frequently get blocked by each others' schema changes and suffer coordination costs. The cost of a painful migration process is canonical second-order: engineers avoid making schema changes — they design features differently or "abuse json columns instead of a proper schema design."

  1. Online schema change tooling is the database-layer half of the solution. Online-schema-change tools (pt-online-schema-change, gh-ost, Vitess's shadow-table machinery) replace rails db:migrate and "run the schema changes in a way so that production traffic to the database is not interrupted." The application-facing benefit: "application developers no longer need to keep track of which schema changes may cause a table to lock. The schema change tooling will make the change in a way that is always safe, mitigating much of the fear around schema changes." Coutermarsh points explicitly at PlanetScale's safe migrations feature built on top of Vitess online schema change.

  2. Application-layer discipline is the other half — atomic cross-system deploy is impossible. Verbatim:

"A common misconception we see among developers is that they think they can atomically deploy both their schema and application code at the same time. This is not possible. For each schema change made, the application needs to be setup to handle both the current and future schema. Without doing so, errors will ensue."

This is the same two-critical-systems-cannot-deploy- atomically argument canonicalised in concepts/coupled-vs-decoupled-database-schema-app-deploy.

  1. PlanetScale's internal Rails workflow: local MySQL for CI, PR-bot for deploy-request creation. Local development: application code modified in a git branch; corresponding schema changes applied against a local instance of MySQL for speed ("Our CI runs against local MySQL for lowest latency"). After local development completes, developer opens a pull request on GitHub. A custom GitHub Actions bot detects any schema changes in the PR, creates a PlanetScale branch, runs the migrations, opens a deploy request (PlanetScale's method for making a schema change), and comments the result back on the PR. Uses the planetscale_rails gem to run the migrations. Canonicalised as patterns/pr-bot-auto-deploy-request.

  2. The bot emits deploy-order instructions specific to the class of change. Two rules stated verbatim:

"When removing a column, application code must be deployed before the schema is changed"

"When adding a column, application code must be deployed after the schema is changed"

These are consequences of expand-migrate- contract expressed at the operational-instruction altitude. Canonicalised as concepts/schema-change-deploy-order.

  1. Deploy request = schema change + code change reviewed together. Verbatim:

"The bot automatically opens a deploy request for us and leaves a comment linking to the change in GitHub. This allows our team to review both the schema change as well as the code. Giving full context around why and what is being changed."

The PR + deploy request are decoupled deployment units (they can ship independently) but are reviewed together — the PR comment links to the deploy request so the reviewer sees both the code diff and the schema diff in one context. This is a non-obvious composition: decoupled execution can still have co-located review.

  1. Vitess online migration + linter-before-deploy + the per-database queue. Before execution the schema change runs through a linter "to catch any common mistakes." The deploy request then "makes the schema changes using a Vitess online migration. This protects production and also allows us to quickly revert the schema change if we notice anything going wrong." The revert link points to the instant-schema-revert-via-inverse-replication mechanism canonicalised in the Schema reverts post.

  2. When multiple team members submit concurrent changes, PlanetScale serialises them into a per-target queue + re-runs safety checks on the combined post-deploy schema. Verbatim:

"When we have multiple team members making schema changes at the same time, PlanetScale will create a queue for each change. This allows each change to be deployed automatically in the order it was added to the queue. It has safety benefits as well, PlanetScale runs safety checks not only on each schema change, but on the resulting database schema with all changes combined. This protects against mistakes when multiple people are making changes at once."

Canonicalised as concepts/schema-change-queue. This is the automatic coordinator that eliminates the "large teams getting blocked by each others' schema changes" failure mode named in §2. Combined-schema safety check is a second-order protection against the "each change is safe, but in combination they produce an invalid schema" class of error.

  1. The architectural claim: decoupled deploy forces better thinking about interaction. Verbatim:

    "Our process to deploy code and schema are purposefully separate, forcing our engineers to think through each step of their change. As well as allowing us to move quickly and never have code changes blocked behind another team members unrelated schema migration."

    The two claims compose: (a) the cognitive forcing function (engineers must consciously sequence deploys), and (b) the team-level unblocking (code changes are never gated on unrelated schema migrations). Section 3 of concepts/coupled-vs-decoupled-database-schema-app-deploy canonicalises the cognitive-forcing argument; §9 of this post canonicalises the team-unblocking half.

Architectural axes

PR-bot as glue between git lifecycle and deploy-request lifecycle

The GitHub Actions bot is the load-bearing integration primitive. It:

  1. Detects schema changes — by looking at the files modified in the PR (Rails migration files under db/migrate/).
  2. Creates a PlanetScale branch — mirrors the git branch in PlanetScale's branching model, providing an isolated target for the schema change.
  3. Runs the migration against the PlanetScale branch — uses the planetscale_rails gem.
  4. Opens a deploy request — PlanetScale's first-class schema-change primitive (already canonicalised via the 2022 Gated Deployments post and the 2024 Schema Reverts post).
  5. Comments on the PR — linking the deploy request back to the PR so reviewers see both contexts.
  6. Emits class-specific deploy-order instructions — "deploy code before schema" for column drops; "deploy code after schema" for column adds. Uses the PlanetScale API to classify the change.

Canonical wiki framing: patterns/pr-bot-auto-deploy-request. The pattern generalises beyond Rails / GitHub Actions / PlanetScale — the skeleton is "CI detects schema-affecting diff → creates isolated schema-change unit on the database platform → posts the link back on the PR", and the substrate variables are CI tool + database platform + bot language. Coutermarsh's final paragraph acknowledges this: "We've implemented our bot using GitHub Actions, however a similar workflow can be achieved with other CI tools as well. On the PlanetScale side, all of the API calls needed are available via the pscale CLI."

Local-MySQL-for-CI as a latency lever

Verbatim: "Our CI runs against local MySQL for lowest latency." Canonical wiki framing: patterns/local-mysql-ci-for-fast-tests. The choice is a tradeoff — CI runs against local MySQL (not against a PlanetScale branch) for speed, even though this means the CI environment is not topologically identical to production (no Vitess routing layer, no sharding, no read-replica splitting). The gap is closed by the deploy-request lifecycle: CI signals the Rails-level migration works; the deploy-request + Vitess-level online migration verifies it runs safely against production topology.

Deploy queue as multi-team coordinator

The per-target deploy queue is canonicalised as concepts/schema-change-queue. Two properties load-bearing:

  • Serialisation — changes apply in submission order, eliminating race conditions from interleaved concurrent DDL.
  • Combined-schema safety — the post-deploy schema is linted as a whole, catching "mistakes when multiple people are making changes at once" (e.g. two changes each introduce a column of the same name but different types; in isolation each passes lint, combined they collide).

This is the automatic coordinator that §2's coupled- deploy failure mode ("larger teams deploying frequently get blocked by each others schema changes") is mitigated by. Compare gated deployments which solves a related but different problem (multi-change per deployment-unit); the queue is multi-deployment-unit over time.

Class-of-change detection = deploy-order instruction

The PR-bot classifies the change and emits an operational instruction to the developer. Two rules verbatim:

  • Removing a column: "application code must be deployed before the schema is changed" — code must stop referencing the column first, then the schema change can drop it safely.
  • Adding a column: "application code must be deployed after the schema is changed" — schema must have the column first, then code can start reading/writing it.

Canonicalised as concepts/schema-change-deploy-order. These rules are direct consequences of expand-migrate- contract expressed at the operational-instruction altitude — instead of asking every engineer to learn the six-step pattern, the bot encodes the implication of the pattern for the specific change type being proposed.

Cross-source continuity

Tenth canonical PlanetScale schema-change-mechanism disclosure on the wiki, fills the application-integration / PR-bot / deploy-coordination axis. Six already-canonical primitives compose in this post:

  1. patterns/expand-migrate-contract — the six-step discipline (from Barnett's 2024 post).
  2. concepts/coupled-vs-decoupled-database-schema-app-deploy — the two-critical-systems-cannot-deploy-atomically argument (from Barnett).
  3. patterns/shadow-table-online-schema-change — the Vitess mechanism that makes the schema side safe (from Guevara + Noach's 2024 Schema Reverts post).
  4. concepts/online-ddl — the engine-level capability matrix.
  5. concepts/gated-schema-deployment / patterns/operator-scheduled-cutover — the deploy-request-with-auto-apply UX (from Noach's 2022 Gated Deployments post).
  6. patterns/instant-schema-revert-via-inverse-replication — the revert substrate (from Guevara + Noach).

The new composition this post canonicalises is the PR-bot + local-MySQL-CI + deploy-queue outer layer that wraps these primitives for a Rails-application workflow. Three canonical new pages: one concept (concepts/schema-change-queue), one deploy-order concept (concepts/schema-change-deploy-order), one PR-bot pattern (patterns/pr-bot-auto-deploy-request), and one local-CI pattern (patterns/local-mysql-ci-for-fast-tests).

Companion to the 2022-01-18 How our Rails test suite runs in 1 minute on Buildkite post (same Mike Coutermarsh byline, same PlanetScale Rails application-tier dogfooding voice, different axis — test-suite speed vs schema-change workflow). Together they are the canonical PlanetScale-application-tier worked examples on the wiki.

Caveats

  • No production numbers — no deploy-request throughput rate, no typical queue depth, no time-from-PR-to- production distribution, no failure-rate or rollback-rate telemetry.
  • Bot implementation details elided — the post says "We've built a 'pull request bot' with GitHub Actions" but does not disclose the bot's code, its schema-change detection heuristic (is it a file-pattern match on db/migrate/*.rb? an AST pass?), or its class-of-change classification rules (how does it distinguish add_column from remove_column vs change_column?).
  • Multi-change coordination under the queue not specified — the post mentions the queue runs safety checks on "the resulting database schema with all changes combined" but doesn't disclose how the queue resolves conflicting changes (first-wins? block-and- prompt? auto-merge?) or how it handles the case where combined-schema lint fails.
  • Co-located review UX not shown — the post describes "review both the schema change as well as the code" but doesn't show a screenshot of what the reviewer sees or disclose whether the deploy-request-view embeds the PR diff (or vice versa).
  • No guidance on what triggers the queue flush — the post says changes deploy "automatically in the order [they were] added to the queue" but doesn't specify whether that's immediate-FIFO or requires operator approval on each gate (compare to operator-scheduled cutover which is a deliberate non-automatic gate).
  • Local-MySQL-CI fidelity gap — CI runs against local MySQL but production runs on Vitess; the post doesn't discuss how Vitess-specific features (sharding, routing rules, read splitting) are tested or what classes of bug the CI-vs-production gap produces.
  • planetscale_rails gem scope not detailed — the post names the gem but doesn't enumerate what it does (migration runner? branch creation? deploy-request integration? combination?).
  • Pedagogical / dogfooding voice — this is a "how we do it" post rather than a mechanism deep-dive; load-bearing primitives are named rather than walked through.
  • Cross-database-environment composition elided — the post mentions PlanetScale runs two databases (one business-data, one sharded for Insights) but doesn't disclose whether the PR-bot creates two deploy requests when a change affects both, or how cross-database consistency is managed.

Source

Last updated · 378 distilled / 1,213 read