PATTERN Cited by 2 sources
Multi-AZ Vitess cluster¶
Problem¶
A Vitess cluster (tablets + VTGates + vtctld) on a managed cloud is exposed to availability- zone failures. Single-AZ deployments lose the entire cluster when the AZ goes offline — unacceptable for paid production databases. The failure-domain has to be spread across AZs at both tiers: the data plane (MySQL tablets holding durable state) and the proxy plane (VTGate instances doing query routing).
Solution¶
Provision the Vitess cluster across minimum 3 AZs in a cloud region:
- Tablets spread across all 3 AZs (typically primary
- replica topology, with replicas placed in other AZs).
- VTGate instances spread across all 3 AZs, behind a region-level load balancer.
- vtctld + topology server distributed across 3 AZs so the control plane survives any single-AZ failure.
The Vitess Operator (or equivalent orchestration) detects AZ failures (observe → diff → act) and reconciles — starting replacement pods in other AZs, redirecting traffic via VTGate, demoting or promoting tablets as needed.
From Brian Morrison II:
"Using Base plan databases as an example, we automatically provision a Vitess cluster across three availability zones in a given cloud region. This includes the tablets that serve up data for your database, as well as the VTGate instances that route queries to the proper tablets. This means that if an AZ gets knocked offline, the Vitess Operator will automatically detect the outage and apply the necessary infrastructure changes to keep our databases online. In fact, we don't even support cloud regions with less than three availability zones!" (Source: sources/2026-04-21-planetscale-scaling-hundreds-of-thousands-of-database-clusters-on-kubernetes)
Why 3 AZs minimum, not 2¶
Two AZs is insufficient because:
- Quorum — leader-election and semi-sync replication protocols typically need majority agreement; 2 nodes can't form a 2-of-2 quorum if one is down.
- Split-brain risk — with 2 AZs and a network partition between them, both sides can think the other is down and try to promote themselves. 3 AZs give you a tiebreaker.
- Rolling-update safety — if one AZ is under maintenance and another fails, 3-AZ deployments still have capacity. 2-AZ deployments lose everything.
Three AZs is the minimum workable topology for any serious durable-state system. PlanetScale enforces it as a hard constraint: "we don't even support cloud regions with less than three availability zones!"
Operator-driven AZ-failure reconciliation¶
The AZ-failure-handling story relies on the Kubernetes Operator pattern:
- Observe — operator watches pod status; pods in a
failed AZ become unreachable or
NodeLost/Unschedulable. - Diff — operator compares current state (pods in 2 remaining AZs) to desired state (pods in 3 AZs); notes the gap.
- Act — operator takes reconciliation actions:
- Start replacement tablet pods in the surviving AZs if capacity allows.
- If the primary was in the failed AZ, orchestrate failover to a replica in a surviving AZ (VTGate routes queries to the new primary via topology updates).
- Replace VTGate pods in the failed AZ with pods in surviving AZs.
The customer sees brief write-path interruption (during primary failover) but the cluster stays online.
Trade-offs¶
- Cost. 3-AZ deployment means 3× data-plane infrastructure (pods + PVCs) compared to single-AZ. Cross-AZ network traffic has egress cost on most cloud providers.
- Latency. Primary-replica replication across AZs adds some latency vs single-AZ. Usually acceptable (single-digit ms in major clouds) but not free.
- Regional limitation. Regions with fewer than 3 AZs (some smaller cloud regions) are excluded from paid-tier support.
- Partial-AZ failures. The pattern handles full AZ outages cleanly. Partial failures (some services in an AZ down, others up) are harder — depends on operator health-check granularity.
Similar patterns elsewhere¶
- AZ failure drill — the shape of testing is the inverse: intentionally simulate AZ failures to validate the multi-AZ topology actually works. Netflix Chaos Kong / Chaos Gorilla pattern.
- Aurora, RDS multi-AZ, Spanner, CockroachDB, FoundationDB all impose similar multi-AZ (or multi-region) constraints for their durability story. The shape is standard for distributed databases at this tier.
Seen in¶
- sources/2026-04-21-planetscale-the-principles-of-extreme-fault-tolerance — Max Englander, 2025-07-03. Canonicalises the minimum 2 replicas per cluster datum verbatim: "a primary instance and a minimum of two replicas. Each instance is composed of a VM and storage residing in the data plane. Instances evenly distributed across three availability zones. Automatic failovers from primaries to healthy replicas in response to failures." Frames the multi-AZ topology as the concrete architectural embodiment of the isolation + redundancy + static stability principle trio, and as the substrate the weekly failover drill rides on.
- sources/2026-04-21-planetscale-scaling-hundreds-of-thousands-of-database-clusters-on-kubernetes — Brian Morrison II, 2023-09-27. Canonical statement of the 3-AZ-minimum placement constraint for paid PlanetScale databases, and of the operator's role in AZ-failure reconciliation. Post doesn't provide MTTR numbers for AZ failover.
Related¶
- systems/vitess
- systems/vitess-operator
- systems/vtgate
- systems/vttablet
- systems/planetscale
- systems/kubernetes
- concepts/availability-zone-failure-drill
- concepts/blast-radius
- concepts/isolation-as-fault-tolerance-principle
- concepts/static-stability
- patterns/custom-operator-over-statefulset
- patterns/always-be-failing-over-drill