PATTERN Cited by 1 source
Rolling instance upgrade¶
Problem¶
Upgrading a stateful database fleet (new version, new instance class, new kernel) without downtime + without the 2× cost penalty of blue/green.
Solution¶
Replace fleet units one at a time under a proxy tier that routes around the in-flight replacement. Each unit is drained, replaced with the new version, brought back into service, then the next unit begins. At any moment only a small fraction of the fleet is mid-upgrade, so availability is preserved without fleet duplication.
Three ingredients compose the pattern at the database tier:
-
Unit-of-replacement granularity — a tablet (Kubernetes pod running
mysqld+vttabletsidecar) is small enough that replacing one at a time is acceptable but large enough that the fleet-wide upgrade completes in reasonable time. -
Proxy tier routing around unavailable units —
vtgateroutes traffic away from tablets that are draining / replacing / warming, so clients never see the individual replacements. -
Automatic tablet replacement on failure — "If a tablet goes down for any reason, our systems automatically reroute traffic to a functional tablet and allocate another tablet to replace the downed instance" (Morrison II, 2024). The same substrate that handles unplanned failures handles planned upgrades.
Canonical implementation¶
PlanetScale on [[systems/ vitess|Vitess]] on Kubernetes. Use cases:
- Instance-class resizing — customer picks new instance type, backend rolls through fleet. "This allows your applications to continue to operate without being taken offline."
- MySQL version upgrades — validated centrally, rolled through fleet; no maintenance window.
- Kernel / OS upgrades — tablets are Kubernetes pods; pod replacement is the upgrade primitive.
Trade-offs¶
Accepts:
- Mixed-version fleet state during upgrade — requires upgrades be backward-compatible across all tablets in flight simultaneously. This constrains what version-upgrade shapes are safe under rolling.
- Per-unit connection drain — clients on the replaced tablet see their connection closed; must reconnect (proxy handles routing).
- Fleet-wide completion time is longer than a blue/green cutover — rolling through 32 tablets one at a time takes longer than one coordinated switchover.
Gains:
- ~1× fleet cost (vs 2× for blue/green).
- No coordinated fleet-wide connection drop — only the one draining tablet's connections.
- No maintenance window — upgrades run continuously under normal traffic.
- Unplanned-failure + planned-upgrade path unified — same substrate handles both.
Contrast: blue/green alternative¶
patterns/blue-green-database-deployment is the coordinated-switchover alternative: 2× fleet cost during upgrade, single coordinated cutover with all-connections drop, but no mixed-version state during upgrade. Blue/green wins on isolation; rolling wins on cost + cutover smoothness.
Seen in¶
- sources/2026-04-21-planetscale-planetscale-branching-vs-amazon-aurora-bluegreen-deployments — Brian Morrison II (PlanetScale, 2024-02-02). Canonical wiki pattern disclosure of rolling upgrades at the database tier under Vitess + Kubernetes. Contrasted against Aurora blue/green on cost, cutover disruption, and revert path.
Related¶
- concepts/rolling-upgrade — the concept page.
- concepts/blue-green-deployment — the alternative.
- patterns/blue-green-database-deployment — canonical alternative pattern.
- systems/vitess — the proxy + tablet substrate.
- systems/kubernetes — the orchestration substrate.