Skip to content

PlanetScale — Backward compatible database changes

Summary

Taylor Barnett (PlanetScale) canonicalises the expand, migrate, and contract pattern for backward-compatible database schema changes — the industry-standard technique (also called "parallel change" or simply "backward compatible changes") for renaming columns / tables, changing data types, or restructuring the representation of existing data without coupling application-code deploys to database-schema deploys. The post is deliberately pattern-first, PlanetScale-second: the six-step sequence (Expand schemaWrite to old + newBackfill old → newRead from newStop writing to oldDelete old) applies to any relational database, and the PlanetScale-specific machinery (branching, deploy requests, Insights query monitoring, MySQL invisible-column support, table-rename warning) is presented as convenience that makes each step safer, not a requirement. The opening argument is equally load-bearing: never ship application-code and database- schema changes in the same atomic unit, because they cannot deploy atomically — the five named reasons (doubled risk, no atomic cross-system deployment, migration-time scaling from seconds to days, pipeline blocked on schema failure, discipline forces backward compatibility) add up to a structural-not-stylistic constraint.

Key takeaways

  1. Database-and-app are two critical systems with independent deployment clocks. "It's impossible for application code and database schema changes to deploy together atomically. If they are ever dependent on each other going out simultaneously, the application will error briefly until the other catches up." The bug-window duration depends on how fast the slower system catches up — for a small schema change, seconds; for a large migration, "30 seconds to a few hours to even more than a day." Canonical new coupled-vs-decoupled database/app deploy concept. (Source: sources/2026-04-21-planetscale-backward-compatible-database-changes)

  2. Additive changes are low-risk; mutating changes are where expand-migrate-contract is mandatory. "Generally, adding a table, column, or view is low-risk and doesn't require much, other than deploying the schema change before your application code that might use the change… Things are riskier when changing or removing a column or table." The distinguishing criterion is whether the change touches schema your production application is already using. Rename column / change data type / split column / merge tables / drop column — all mandatorily expand-migrate-contract. Add column / add table — just "deploy schema before the code that uses it" (order matters, but the full six-step dance does not).

  3. Canonical six-step sequence. The full pattern, verbatim from the post:

  4. Step 1 — Expand: add the new column / table (nullable and/or with defaults so existing writes don't error). Deploy the schema change alone.
  5. Step 2 — Write to both: update application to write both the old and new schema on every write. Reads still come from the old. Deploy and confirm no user impact.
  6. Step 3 — Backfill: migrate historical rows from old → new via a one-shot script (for small data) or background job (for large data). Spot-check correctness.
  7. Step 4 — Read from new: update application to read from the new column / table. Last opportunity to verify data correctness and performance; still able to roll back application-code because both schemas are intact and dual-written.
  8. Step 5 — Contract (stop dual writes): update application to write only to the new schema. Old schema now read-free and write-free but still exists (safety margin).
  9. Step 6 — Delete old: drop the old column / table. Optional safety measure: make it invisible or rename it before dropping, so an un-migrated consumer errors loudly instead of reading stale data. Canonical new patterns/expand-migrate-contract pattern.

  10. Between-step rollback is the architectural property. "A nice benefit of this approach is that you can always rollback the application code after it is deployed since the schema is still in a backward compatible state." At every step through Step 4, the application-code deploy is rollback-able because the old schema is still read-complete and write-complete. After Step 5 the rollback window narrows (the old schema is stale on everything written since Step 5). After Step 6 the rollback is impossible from the database side. This is the property that makes the pattern low-risk: each deploy in isolation has a working old-state it can fall back to. Ship fast because every deploy is safe, not despite being slow.

  11. Backfill must be split by size. "If there is a lot of data to move, consider spreading it over an extended period using background jobs. This will prevent it from affecting your production database performance and users." Implicit rule: if the backfill materially affects user-query latency (IOPS contention, replication lag, lock contention on the updated rows), background-job it; otherwise a one-shot script is fine. No quantitative cut-over threshold given.

  12. Verify at the cutover point with production-performance monitoring. "For performance monitoring, you can use Insights or another application performance monitoring tool to make sure everything is working as expected." Step 4 — the read-from-new cutover — is explicitly called out as the step where production-performance monitoring is load-bearing (the new schema may index differently, may have different selectivity, may sit on a different shard). First canonical wiki instance of Insights positioned as a cutover-verification tool (complementing its prior framings as workload-telemetry substrate for index-suggestions + traffic-control).

  13. MySQL-specific safety primitive: invisible column. "If it is a column you are changing, make the column invisible in MySQL from SELECT * queries." Invisible columns were added in MySQL 8.0 (not mentioned in the post but the standard MySQL feature). The column still physically exists and participates in explicit-column queries; it's skipped by SELECT * and similar unqualified queries. Canonical deprecation-discovery pattern: if code still reads the column by name, it errors loudly (developer fixes it); if the column was only ever read through SELECT *, it's silently hidden and stops mattering. First canonical wiki citation.

  14. Table/column rename on PlanetScale produces a warning, not a silent block. "You cannot do a rename without creating a new column in PlanetScale, but PlanetScale does warn you if a table has been recently queried in a deploy request." PlanetScale implements renames as add-new-column + backfill + drop (the expand-migrate-contract pattern applied to the rename) rather than offering a direct RENAME COLUMN DDL — the rationale is that a direct rename is itself a synchronous-breaking-change and the platform opts for the safer-by-default shape. The "recently queried" warning is the query-telemetry-as-deploy-safety signal.

Worked example (GitHub-stars tracker)

The post walks a concrete schema consolidation as the narrative spine of the pattern — the star table (keyed on repo) is to be merged into the repo table so the star_count column lives next to the repo metadata. This is the change-the-data-of-an-existing-table case, not a plain column rename.

  • Initial state: repo(id, repo_name, organization, ...) and star(id, repo_name, organization, star_count). Duplication across every row of star — organization and repo_name mirror repo.
  • Step 1 (Expand): ALTER TABLE repo ADD COLUMN star_count INT; repo.star_count is nullable (no default, so NULL on existing rows).
  • Step 2 (Write to both): application writes star_count to both repo and star on every star-count write. Old reads still go to star.
  • Step 3 (Backfill): script copies star.star_countrepo.star_count keyed on repo_name. "not much data, so it is safe not to use background jobs" — the specific-size decision. Post-backfill, repo.star_count is complete.
  • Step 4 (Read from new): application reads star_count from repo only. Production-performance check via Insights or equivalent.
  • Step 5 (Contract / stop writing to old): application stops writing to star.star_count entirely. star is now frozen but still exists.
  • Step 6 (Delete old): DROP TABLE star after a few days of no-issues. The data is not gone — it's now all in repo.

Architectural framings

The five-reason structural argument against coupled deploys

  1. Risk multiplies. Two critical systems deployed together = double the surface where something can fail.
  2. Deploy atomicity is a lie across systems. Application deploy runs on app hosts; schema change runs on the database. They will finish at different moments. If either depends on the other's new state, there's a bug window.
  3. Schema-change time scales with data. "As data size grows, migrations can take longer. It can go from 30 seconds to a few hours to even more than a day!" If the app deploy is gated on the migration, deploy-frequency collapses at scale.
  4. Pipeline blocking is a shared-resource-of-engineering- velocity failure. "If something goes wrong with the database schema change when coupled together, the deployment of the application is now blocked." One team's bad migration blocks every other team's deploy.
  5. Forced discipline yields backward compatibility as a free property. Separating the two systems forces the schema change to be backward-compatible with the running app, and forces the app change to be backward-compatible with the old schema. The pattern is not just risk- reduction; it's also a correctness-by-construction technique because decoupling makes single-step atomicity impossible, which rules out single-step breakage.

Why the six-step dance instead of a direct ALTER

Direct ALTER TABLE ... RENAME COLUMN (where supported) is synchronous with respect to application compatibility: the moment the DDL commits, every reader must know the new column name. This means either (a) the app is already deployed with the new column name (and was broken on the old schema), or (b) the app is deployed with the old name (and breaks on the new schema). Expand-migrate-contract replaces this with a six-deploy sequence where each deploy in isolation has a working rollback target, at the cost of six deploys and three weeks instead of one deploy and a few minutes. This is the canonical cost-vs-risk trade-off in the pattern.

Where expand-migrate-contract generalises

The post is MySQL-flavoured but the pattern is database- engine-agnostic. It applies to:

  • Relational DBs with strict DDL (MySQL, Postgres, SQL Server) — the canonical substrate for the pattern.
  • Distributed KV/document stores — e.g. DynamoDB re-keying via new_key + dual-write + backfill + read- from-new + stop-write + GC old keys.
  • Search-index schema evolution — Elasticsearch mapping changes often follow this pattern via index-aliasing, though there the backfill is a full reindex.
  • Kafka topic shape changes — new-topic + dual-publish
  • consumer-cutover + old-topic-drain + old-topic-delete is the same six-step sequence.
  • API schema changes — publish new schema + dual-write
  • migrate consumers + drop old (the lineage that concepts/schema-evolution tracks).

The common invariant is: add a new representation without breaking the old → make both representations track the truth simultaneously → flip the authoritative source → retire the old representation. The six-step sequence is the operationalisation of that invariant.

Caveats

  • No production numbers. The post is a teaching walkthrough; there's no "we've shipped N such changes per quarter" / "average migration lifetime is X weeks" / "error rate during backfill is Y" datum. Tier-3 pedagogical-voice post, not a retrospective.
  • Backfill-size threshold is qualitative. "If there is a lot of data" → background jobs; otherwise one-shot script. No IOPS / row-count / duration cut-over number disclosed.
  • Concurrent-write semantics during dual-write not deeply discussed. The post treats old and new writes as if they're free to compose, but real applications may see race conditions between the old-write and the new-write within a single transaction, especially across different transaction boundaries if the dual- write straddles them. The standard mitigation — put both writes in the same transaction — is implicit but not called out.
  • Rollback window between Step 5 and Step 6 is not quantified. The post says "a few days of no issues" — acceptable for the GitHub-stars example, but insufficient for data where drift is hard to detect (e.g. a column whose downstream use is a quarterly batch job that runs outside the "few days" window). The correct rollback window is "longer than the longest feedback loop that touches this column" — unstated.
  • Pattern does not cover multi-region or cross-service consistency. A column read by another service via API or event stream needs the pattern applied at that service's boundary too (see concepts/schema-evolution for the CDC sibling). The post is single-database, single-application.
  • PlanetScale branching / deploy-request features are presented as convenience, not as load-bearing. The pattern works on any database. "While PlanetScale can help make safe schema changes alongside the pattern, the pattern can apply to any relational database schema changes."
  • Decrement / backward-migration case not covered. The pattern-direction is additive-expand → subtractive- contract. What about the reverse direction (consolidating two columns back into one, or un-splitting a schema change)? Not discussed — presumably run the pattern mirrored, but the complications of "old system has values for columns the new system doesn't" are not walked through.
  • No guidance on what happens mid-step under failure. If a backfill script dies halfway through Step 3, the pattern relies on backfill-idempotency. If a Step 4 deploy fails mid-rollout and only some instances have cutover, reads are split across old + new schemas. The post assumes clean cutovers.

Systems, concepts, and patterns introduced

Systems (extended): MySQL (invisible- column 8.0 feature; expand-migrate-contract mitigation substrate), PlanetScale (branching + deploy-request + Insights + rename-warning as pattern- enabling machinery), PlanetScale Insights (Step 4 production-monitoring aide).

Concepts introduced:

Concepts extended:

  • concepts/backward-compatibility — post establishes the schema-DML variant (schema-level backward-compat discipline achieved through step-wise dual-state coexistence) alongside the existing bookmarked-URL and schema-evolution variants.
  • concepts/schema-evolution — extends the async-CDC framing with the synchronous-application variant where schema + app deploy are decoupled via the pattern.

Patterns introduced:

Patterns extended:

  • patterns/dual-write-migration — expand-migrate- contract's Step 2 (write to both) is the classical dual-write pattern specialised to within-a-single-database; complements the cross-system instance from Airbnb's StatsD→OTLP migration.

Source

Last updated · 319 distilled / 1,201 read