PATTERN

Multi-AZ Vitess cluster¶

Problem¶

A Vitess cluster (tablets + VTGates + vtctld) on a managed cloud is exposed to availability- zone failures. Single-AZ deployments lose the entire cluster when the AZ goes offline — unacceptable for paid production databases. The failure-domain has to be spread across AZs at both tiers: the data plane (MySQL tablets holding durable state) and the proxy plane (VTGate instances doing query routing).

Solution¶

Provision the Vitess cluster across minimum 3 AZs in a cloud region:

Tablets spread across all 3 AZs (typically primary
replica topology, with replicas placed in other AZs).
VTGate instances spread across all 3 AZs, behind a region-level load balancer.
vtctld + topology server distributed across 3 AZs so the control plane survives any single-AZ failure.

The Vitess Operator (or equivalent orchestration) detects AZ failures (observe → diff → act) and reconciles — starting replacement pods in other AZs, redirecting traffic via VTGate, demoting or promoting tablets as needed.

From Brian Morrison II:

"Using Base plan databases as an example, we automatically provision a Vitess cluster across three availability zones in a given cloud region. This includes the tablets that serve up data for your database, as well as the VTGate instances that route queries to the proper tablets. This means that if an AZ gets knocked offline, the Vitess Operator will automatically detect the outage and apply the necessary infrastructure changes to keep our databases online. In fact, we don't even support cloud regions with less than three availability zones!" (Source: )

Why 3 AZs minimum, not 2¶

Two AZs is insufficient because:

Quorum — leader-election and semi-sync replication protocols typically need majority agreement; 2 nodes can't form a 2-of-2 quorum if one is down.
Split-brain risk — with 2 AZs and a network partition between them, both sides can think the other is down and try to promote themselves. 3 AZs give you a tiebreaker.
Rolling-update safety — if one AZ is under maintenance and another fails, 3-AZ deployments still have capacity. 2-AZ deployments lose everything.

Three AZs is the minimum workable topology for any serious durable-state system. PlanetScale enforces it as a hard constraint: "we don't even support cloud regions with less than three availability zones!"

Operator-driven AZ-failure reconciliation¶

The AZ-failure-handling story relies on the Kubernetes Operator pattern:

Observe — operator watches pod status; pods in a failed AZ become unreachable or NodeLost / Unschedulable.
Diff — operator compares current state (pods in 2 remaining AZs) to desired state (pods in 3 AZs); notes the gap.
Act — operator takes reconciliation actions:
Start replacement tablet pods in the surviving AZs if capacity allows.
If the primary was in the failed AZ, orchestrate failover to a replica in a surviving AZ (VTGate routes queries to the new primary via topology updates).
Replace VTGate pods in the failed AZ with pods in surviving AZs.

The customer sees brief write-path interruption (during primary failover) but the cluster stays online.

Trade-offs¶

Cost. 3-AZ deployment means 3× data-plane infrastructure (pods + PVCs) compared to single-AZ. Cross-AZ network traffic has egress cost on most cloud providers.
Latency. Primary-replica replication across AZs adds some latency vs single-AZ. Usually acceptable (single-digit ms in major clouds) but not free.
Regional limitation. Regions with fewer than 3 AZs (some smaller cloud regions) are excluded from paid-tier support.
Partial-AZ failures. The pattern handles full AZ outages cleanly. Partial failures (some services in an AZ down, others up) are harder — depends on operator health-check granularity.

Similar patterns elsewhere¶

AZ failure drill — the shape of testing is the inverse: intentionally simulate AZ failures to validate the multi-AZ topology actually works. Netflix Chaos Kong / Chaos Gorilla pattern.
Aurora, RDS multi-AZ, Spanner, CockroachDB, FoundationDB all impose similar multi-AZ (or multi-region) constraints for their durability story. The shape is standard for distributed databases at this tier.

Seen in¶

— Max Englander, 2025-07-03. Canonicalises the minimum 2 replicas per cluster datum verbatim: "a primary instance and a minimum of two replicas. Each instance is composed of a VM and storage residing in the data plane. Instances evenly distributed across three availability zones. Automatic failovers from primaries to healthy replicas in response to failures." Frames the multi-AZ topology as the concrete architectural embodiment of the isolation + redundancy + static stability principle trio, and as the substrate the weekly failover drill rides on.
— Brian Morrison II, 2023-09-27. Canonical statement of the 3-AZ-minimum placement constraint for paid PlanetScale databases, and of the operator's role in AZ-failure reconciliation. Post doesn't provide MTTR numbers for AZ failover.
— Real-world stress-test of the 3-AZ minimum topology against a partial partition. Phase 2 of the 2025-10-20 AWS us-east-1 incident (14:30–19:30 UTC window) surfaced partial network partitions where cross-AZ connectivity failed asymmetrically — exactly the failure mode the 3-AZ-minimum argument doesn't cover (the argument tolerates one AZ down only when the remaining two stay connected). Operator response was manual zonal reparenting of primaries to AZs colocated with the customer's application or with fewer observed partition symptoms. PlanetScale's forward-looking remediation commitment: exploiting us-east-1's six AZs (most AWS regions have three) to make the placement more resilient to both zonal outages and the partial-partition failure mode between them. Canonical case study for when the 3-AZ-minimum topology holds (phase 1 — control-plane dependency chain died but the multi-AZ data plane was untouched) and when it's at the edge of its design envelope (phase 2 partial partitions).