Skip to content

PATTERN Cited by 1 source

Blue/green database deployment

Problem

Applying schema changes, version upgrades, or instance-class changes to a stateful database without downtime is harder than the analogous application-tier problem. The database holds state that must remain consistent; naive "restart on new version" loses transactions; running the schema change in-place on a large table under load can produce unacceptable lag or lock contention.

Solution

Blue/green database deployment clones the database cluster into a parallel environment ("green"), applies changes there, and switches traffic over via a scripted switchover. The pattern is the database-tier analogue of application-tier blue/green + inherits its "two environments, flip at cutover" shape.

Four mechanism ingredients compose the pattern:

  1. Copy-on-write storage clone to make the green environment creation cheap + instantaneous. Storage pages are shared until one side writes; divergent pages copied on demand.

  2. Binlog replication between blue and green for ongoing sync of committed transactions during the green-side lifetime. Green receives all writes that land on blue, plus any operator-initiated schema / config changes on green.

  3. Scripted switchover lifecycle with guardrails: (a) compatibility + long-running-operation checks; (b) stop new writes on blue + drop all connections; (c) wait for final writes + replication catch-up; (d) switch blue to read-only; (e) rename resources so green adopts the original endpoint names.

  4. Post-switchover blue retention — blue environment remains running (read-only, renamed) after cutover, accruing cost until explicitly torn down. Not usable as rollback target (no writes synced back to blue).

Canonical implementation

Amazon Aurora's blue/green deployments on RDS + Aurora clusters. Composition: Aurora cluster storage layer's copy-on-write primitive is directly exposed as the blue/green storage-fork substrate; binlog replication is AWS-managed; the switchover lifecycle is driven by AWS's blue/green orchestration service.

Use cases called out in the canonical Morrison II (2024) ingest:

  • Minor schema changes — add columns at end of table, create indexes, drop indexes. Bounded by what binlog replication can round-trip.
  • MySQL version upgrades — test the new version in green under production-like traffic before cutting over.
  • Instance-class scaling — reduce downtime of the boot-new-compute-node operation.

Trade-offs

Accepts:

  • 2× compute cost during green-side lifetime (plus ongoing post-switchover cost until blue is torn down).
  • Switchover disruption — all connections dropped, happy-path window "less than a minute" but can extend for long-running operations.
  • Schema-change envelope bounded by binlog replication compatibility — mid-table column adds, column renames, type narrowing, some charset changes are out of scope.
  • No revert path — post-switchover blue is stale; reverting means another blue/green deployment in reverse.
  • Two-side-writeable risk — copy-on-write storage permits writes on both sides; operator must prevent concurrent modification of the same rows (AWS provides no automatic reconciliation).

Gains:

  • Connection-drop cutover instead of minutes-of- downtime for many operation classes.
  • Test new version under production-like traffic before committing.
  • Minimal initial storage cost via copy-on-write.

Contrast: rolling-upgrade alternative

patterns/rolling-instance-upgrade is the architectural alternative at the database tier — no coordinated switchover, no fleet doubling, per-unit connection drain instead of fleet-wide drop. Applies to stateful fleets where the unit of replacement (Vitess tablet, Kubernetes pod) can be drained and replaced individually under a proxy tier that routes around the in-flight replacement.

Seen in

Last updated · 470 distilled / 1,213 read