PATTERN Cited by 1 source
Expand-migrate-contract (parallel change)¶
Expand-migrate-contract (also known as parallel change, backward-compatible changes, or colloquially the expand-contract pattern) is the canonical six-step discipline for changing an existing database-schema element — renaming a column/table, changing a data type, splitting or merging columns, restructuring how data is represented — without coupling the application-code deploy to the schema deploy. Each step is independently deployable and independently rollback-able; the cost is that what would be a single synchronous breaking change becomes a six-step sequence running over days to weeks.
When to use it¶
Use expand-migrate-contract when:
- The schema element being changed is already in use by the production application (reads or writes).
- Downtime is unacceptable (OLTP path, customer-visible queries, or cross-service contracts).
- The schema change is structural (rename / type change / split / merge / reshape) rather than additive.
Do NOT use it for purely additive changes — adding a new table / column / view. For additive changes, deploying the schema change first (before the application code that uses it) is sufficient; the six-step dance is cost-without-benefit. The distinguishing test is "does this change touch schema my production application is already using?" — if no, just deploy schema-then-app.
(Source: sources/2026-04-21-planetscale-backward-compatible-database-changes)
The six steps¶
Step 1 — Expand¶
Add the new schema element alongside the old one. Make
the addition minimally-disruptive to existing writes: a new
column must be nullable or carry a safe default so
existing INSERT statements that don't mention it still
succeed.
Deploy and verify: the application is unchanged, writes still go to the old schema, reads still come from the old schema. The new column / table exists but is empty / unreferenced.
Step 2 — Dual-write¶
Update the application to write the new representation to the new schema on every write, in addition to writing the old representation to the old schema. Reads still come from the old schema.
# Before
session.execute("INSERT INTO star (repo_name, star_count) VALUES (%s, %s)",
(repo_name, count))
# After
session.execute("INSERT INTO star (repo_name, star_count) VALUES (%s, %s)",
(repo_name, count))
session.execute("UPDATE repo SET star_count = %s WHERE repo_name = %s",
(count, repo_name))
Deploy and verify: the application's behaviour for users is unchanged (reads still hit the old schema); new writes now populate both schemas. Confirm no regression. If the new-schema write fails, the old-schema write still succeeds and the app is unaffected — the dual-write is structurally safer than a pure switch.
Transaction discipline: in relational databases, put both writes in the same transaction so neither-or-both is the guaranteed outcome. Cross-store dual-write (e.g. relational + search index) is beyond the scope of this pattern and needs a CDC-shaped approach (concepts/change-data-capture).
Step 3 — Backfill¶
Copy historical data from the old schema to the new schema. Two shapes:
- One-shot script: for small data ("there's not much data, so it is safe not to use background jobs"). Run once, verify, move on.
- Background job: for large data. Spread the backfill over hours / days to avoid IOPS contention, replication lag, or lock contention on the active tables.
Include any data transformations in the backfill — if the semantic shape is changing (e.g. splitting a string column into two), the transformation lives in the backfill script, not in the live dual-write (the dual-write should already be emitting the new shape from Step 2).
Spot-check after the backfill completes. The pattern's correctness depends on backfilled rows + dual-written rows = complete new-schema state, so any gap is a silent data-correctness bug.
Step 4 — Read from new¶
Update the application to read from the new schema. This is the most sensitive step — the first deploy where user-visible behaviour depends on the new schema. Production performance monitoring is load-bearing: the new column may index differently, sit on a different shard, have different selectivity, or have unexpected null distributions.
Tools: application performance monitoring, database query profiling (e.g. PlanetScale Insights, Datadog APM, New Relic, per-query EXPLAIN comparisons).
Rollback property: at this step, the application-code can still roll back to reading from the old schema because the old schema is still fully read-complete (thanks to Step 2's dual-writes). This is the last step at which pure application-code rollback fully restores correctness.
Step 5 — Contract (stop writing to old)¶
Update the application to write only to the new schema, removing the dual-write. The old schema is now read-free and write-free but still exists as a safety margin.
Deploy and verify: watch for any late-arriving reader (a cron job, a separate service, a reporting pipeline) that's still reading the old schema and will now see stale data from this point forward. The existence of such a reader is itself a signal that the pattern wasn't applied at every consumer — go apply it there before proceeding.
Rollback window narrows: reverting the application restores writes to the old schema but the old schema is now stale on everything written since this deploy. The safe rollback window is bounded by how quickly you can detect a problem — typically minutes to hours.
Step 6 — Delete old¶
After a waiting period (the post suggests "a few days of no issues"), drop the old column / table. At this point the old schema is gone; rollback from this point requires restore-from-backup.
Optional deprecation-discovery step: before the final
DROP, make the old column / table discoverable-broken:
- MySQL 8.0: mark the column invisible
(invisible column) —
it's hidden from
SELECT *but still referenceable by name. Any code that reads it viaSELECT *silently loses access; any code that explicitly names it errors loudly, triggering a fix before the final drop. - Table rename: rename the table to something like
star_DEPRECATED_2026_04. Any code still referencing the old name errors; after the agreed-upon window, drop the renamed table.
The waiting period is "longer than the longest feedback loop that touches this column" — for a column read by a daily cron job, a day is enough; for a column read by a quarterly batch report, a quarter is the right window.
Why the pattern works¶
Each step has a working rollback target¶
The core property: at every step, rolling back the most recent deploy restores a fully-working system.
| After step | Old schema | New schema | Rollback target |
|---|---|---|---|
| 1 Expand | read+write complete | empty | redeploy app without new column |
| 2 Dual-write | read+write complete | write-complete from now | remove new-write from app |
| 3 Backfill | read+write complete | read-complete | drop backfilled data, repeat later |
| 4 Read from new | read-complete (dual-written), write-complete | read+write complete | revert app to reading old |
| 5 Stop old writes | read-complete (frozen) | read+write complete | restore dual-writes in app |
| 6 Delete old | gone | read+write complete | restore-from-backup only |
Each deploy is a safe-by-construction transition between
adjacent safe states. Contrast with a direct RENAME COLUMN
+ simultaneous app deploy: the system is not safe during
the brief window where the DDL has committed and some app
instances have the new code while others have the old.
Decoupling the deploy clocks¶
The pattern operationalises the principle that application-code and database-schema have independent deployment clocks that cannot synchronise. Canonicalised at concepts/coupled-vs-decoupled-database-schema-app-deploy. The six-step pattern is the concrete technique that makes decoupled deploys viable; without it, every schema change requires a maintenance window.
Forcing backward compatibility as a free property¶
Because the pattern requires each step to work with both the old and new code running simultaneously (during rollout), backward compatibility falls out as a construction property, not a design goal. The schema during each step is compatible with the previous step's code, and the code for each step is compatible with the previous step's schema.
Worked example: consolidating a star table into repo¶
(From the PlanetScale post: source tracks GitHub star counts
in a separate star table; wants to merge into repo.)
| Step | Action | SQL / code |
|---|---|---|
| 1 Expand | Add repo.star_count |
ALTER TABLE repo ADD COLUMN star_count INT; |
| 2 Dual-write | App writes to both | UPDATE repo SET star_count = ?; UPDATE star SET star_count = ?; |
| 3 Backfill | Script UPDATE repo SET star_count = (SELECT star_count FROM star WHERE ...) |
one-shot (small data) |
| 4 Read new | App reads repo.star_count |
Verify via APM |
| 5 Contract | App stops writing star.star_count |
single-deploy |
| 6 Delete | DROP TABLE star after waiting period |
single DDL |
Variants at other substrates¶
The six-step shape generalises beyond relational DDL:
- DynamoDB re-keying:
new_keyattribute + dual-write - backfill via
Scan+ read-from-new + stop-write + GC old keys. - Elasticsearch index reshape: new index + index alias
- dual-write (new + old) + reindex (backfill) + swap alias (read-from-new) + stop-writing-old + delete-old.
- Kafka topic reshape: new-topic + dual-publish + consumer-cutover + old-topic-drain + old-topic-delete.
- Protobuf schema change: add new field tag + dual-populate + migrate consumers + stop-populating-old
- remove-old-field. (Note proto3 semantics already encourage this via explicit-optional and oneof over enum + field.)
- API endpoint migration: new endpoint + dual-emit (old + new clients hit matching endpoints) + migrate clients + retire old endpoint.
All variants share the invariant: "add new representation without breaking old → make both representations track the truth → flip authoritative source → retire old representation."
Caveats and failure modes¶
Mid-step failure recovery¶
The pattern assumes clean cutovers between steps. Failure modes that are not addressed by the pattern alone:
- Mid-rollout deploy split (Step 4): if the new-read deploy only reaches half the fleet before a failure, reads are split across old + new for the unrolled-out duration. Mitigation: feature-flag the read-from-new behaviour so the split can be quickly reversed without waiting for the deploy pipeline.
- Mid-backfill crash (Step 3): backfill script must be
idempotent (use
WHERE new_col IS NULLor similar) so restart is safe. - Dual-write race conditions (Step 2): if the old-write and new-write span different transactions, a concurrent update can interleave and leave the two schemas disagreeing. Mitigation: same transaction for both writes.
Cost¶
- Six deploys and N weeks instead of one deploy and
minutes for a direct
RENAME. The cost scales with how many application-code changes the pattern requires — for a simple column rename touching 10 call-sites in one service, a week is typical; for a schema reshape touching dozens of services, months. - 2× storage and write load during Steps 2–5 while both schemas are populated.
- Discipline overhead: every consumer of the schema (including cron jobs, batch reports, external services) must be identified and migrated. Missing one produces a silent data-drift bug after Step 5.
Where the pattern is insufficient¶
- Multi-service / cross-boundary schema — the pattern is single-database / single-application. If a column is read via API or event stream by another service, the pattern must be applied at each service boundary. See concepts/schema-evolution for the CDC sibling of this discipline.
- Irreversible data transformations — e.g. hashing a column before storage. The pattern assumes the transformation is one-way reproducible in Step 3; if it isn't, pick a transformation that is, or accept that Step 6 is truly irreversible.
- Schema with physical-layout implications — changing
a column's type from
INTtoBIGINTvia expand-migrate-contract is fine; changing it fromVARCHARtoTEXTon a table with millions of rows may not be expressible without a full table rewrite, which the pattern doesn't accelerate (but still de-risks by making the app-code side independent of the DDL).
Seen in¶
- sources/2026-04-21-planetscale-backward-compatible-database-changes
— canonical definitional reference on the wiki.
Taylor Barnett (PlanetScale) walks the full six-step
sequence with MySQL-specific safety primitives (MySQL 8.0
invisible-column as the Step-6 deprecation-discovery
tool, PlanetScale branching + Insights as the
convenience substrate) and a worked example merging a
startable into arepotable. The post's thesis sentence: "This ensures that at any step of the process, you can rollback without data loss or significant disruptions to users. This greatly reduces the risk and allows you to move faster and with confidence."
Related¶
- concepts/coupled-vs-decoupled-database-schema-app-deploy — the first-principles framing the pattern operates on.
- concepts/backward-compatibility — the general property; this pattern is its schema-DML operationalisation.
- concepts/schema-evolution — sibling discipline for async-CDC pipelines where the pattern is the producer side of the evolution.
- concepts/mysql-invisible-column — MySQL 8.0 primitive making Step 6 deprecation-discoverable.
- patterns/dual-write-migration — Step 2's generalisation across different substrates (protocol migration, observability platform migration).
- patterns/shadow-migration — sibling risk-reduction migration pattern; shadow-migration runs both systems simultaneously and reconciles, while expand-migrate-contract stages the swap through a shared database.
- systems/planetscale-insights — Step-4 production- performance-monitoring aide.
- systems/mysql — canonical substrate; invisible-column is MySQL-specific.