PLANETSCALE 2023-09-27 Tier 3

PlanetScale — Scaling hundreds of thousands of database clusters on Kubernetes¶

Brian Morrison II (2023-09-27) explains how PlanetScale runs "hundreds of thousands of databases all over the world" as Vitess clusters on Kubernetes, and why their shape deviates from the Kubernetes-community-recommended StatefulSet pattern in specific, load-bearing ways. Introduces the PlanetScale Vitess Operator as the custom control-plane that orchestrates Vitess clusters (VTGate, VTTablets, vtctld) and canonicalises the operator+PVC-over-StatefulSet design choice (patterns/custom-operator-over-statefulset) that PlanetScale adopted once they had a stateless proxy (VTGate) doing the pod-addressing work that StatefulSets would otherwise provide.

Summary¶

The post is a pedagogical walk through three layers: (1) Kubernetes basics — pods, nodes, the Control Loop (observe → diff → act), YAML config; (2) default database-on-K8s best practice — StatefulSets for stable pod identity + persistent storage, with three unsolved operational questions left on top (replication + node-failure handling, source-of-truth determination, backup/restore semantics); (3) PlanetScale's actual shape — Vitess (stateless VTGate proxy + topology server + VTTablet sidecar) managed by the PlanetScale Vitess Operator (a Kubernetes Operator that extends the Control Loop with custom resources), which chooses plain pods with direct cloud PVCs over StatefulSets because VTGate handles routing. Paired with auto-expanding cloud storage, a dedicated backup-tablet via patterns/validated-backup-via-restore-replay, and three-AZ minimum deployment (patterns/multi-az-vitess-cluster — "we don't even support cloud regions with less than three availability zones").

Key takeaways¶

Stateful-vs-stateless is the core database-on-K8s question. Applications deployed to Kubernetes are built on the assumption that container-local data is ephemeral — "If a container within a pod crashes, any data being stored within the container is essentially lost." Databases violate that assumption, which is why special considerations are required. This is the on-ramp for the StatefulSets recommendation.
StatefulSets are the Kubernetes-community- recommended default — "According to the official docs, StatefulSets are the recommended way to run a database on Kubernetes." StatefulSets provide three guarantees: (a) persistent storage attached to the pod; (b) stable pod name + network address across restarts; (c) ordered pod startup. Compared to Deployments which "aim to keep a specific number of a given pod online but don't care in what order they come up."
StatefulSets alone don't solve the three operational hard problems of running MySQL at scale: (a) replication + node-failure handling — if a MySQL pod's node goes offline, what happens to replication? (b) source-of-truth determination — "how do you know which of the pods has the most up-to-date data, and how should your application determine which pod to query from?" (c) backups + restores — where do backups come from, how to validate completeness, how to restore. These are the gaps the Vitess topology + operator stack addresses.
Vitess is the MySQL-scaling substrate. Vitess layers a stateless proxy VTGate and a topology server on top of MySQL. Each MySQL pod is a "tablet" — a pod running MySQL plus a vttablet sidecar — managed by the vtctld control plane. VTGate + topology determine "how many MySQL instances exist, how to access them, and (in a horizontally sharded configuration) on which pod the requested data lives." Vitess is the answer to the source-of-truth and routing questions that StatefulSets don't solve.
The PlanetScale Vitess Operator is a custom K8s operator that extends the Control Loop. "Operators allow developers to extend Kubernetes by adding custom resources that add to the Control Loop." The operator consumes a PlanetScale-defined CRD (custom resource definition), diffs current vs desired state via the Kubernetes API, and reconciles — creating VTGate deployments, VTTablet pods, vtctld instances, PVCs, network config — to run and operate a full Vitess cluster per customer database. This is the Operator pattern (patterns/custom-operator-over-statefulset instantiated for a data system).
The deploy-new-database data flow goes: API → PlanetScale's custom orchestration layer → creates a CRD → Vitess Operator detects desired-vs-current-state diff via the Control Loop → Operator creates the Vitess cluster resources → orchestrator notifies the rest of the system the database is ready. This is the same pattern as Kubernetes' built-in controllers (Deployment, StatefulSet), just extended with custom-resource and custom-reconciler code.
Explicit deviation from the StatefulSet default: plain pods + direct cloud PVC. "The recommended best practice is to use StatefulSets to run databases… We actually don't do this and opt instead to use the logic built into the Vitess Operator to spin up pods that attach directly to cloud storage using a persistent volume claim (PVC). Because we already have a routing mechanism in place (VTGate), we don't need to be concerned about the name or address of a given pod." The two pillars that StatefulSets provide — stable identity and persistent storage — are both handled by other components in the stack: identity by VTGate + topology server (systems/vtgate + concepts/vitess-topo-server), storage by cloud- native PVCs tied to the pod via the operator. See patterns/custom-operator-over-statefulset.
Auto-expanding cloud storage via provider APIs. "We have monitoring mechanisms in place to detect when provisioned cloud storage that serves a database starts nearing capacity. When this occurs, our internal systems will use the cloud providers' APIs to automatically allocate additional space so the databases that are being served by that storage do not stop from capacity issues." This is the automated-volume-expansion shape — the operator doesn't just provision storage, it continuously monitors utilisation and grows the PVC via cloud APIs without user intervention.
Dedicated backup tablet via restore-then-backup semantics. "We actually utilize Vitess to create a special type of tablet that's ONLY used for backing data up. To back up a database on PlanetScale, our system will restore the latest version of the backup to this tablet, replicate all of the changes that have occurred since the backup was taken, and then create a brand new backup based on that data." Two wins: (a) no production-MySQL performance impact from the backup itself, (b) backups are self-validating — because every new backup starts from a successful restore of the previous one, a broken backup can't silently propagate. See patterns/validated-backup-via-restore-replay (this same pattern is canonical in the PlanetScale backup story across multiple sources) and patterns/dedicated-backup-instance-with-catchup-replication.
Three-AZ minimum deployment for paid production databases. "Using Base plan databases as an example, we automatically provision a Vitess cluster across three availability zones in a given cloud region. This includes the tablets that serve up data for your database, as well as the VTGate instances that route queries to the proper tablets. This means that if an AZ gets knocked offline, the Vitess Operator will automatically detect the outage and apply the necessary infrastructure changes to keep our databases online. In fact, we don't even support cloud regions with less than three availability zones!" See patterns/multi-az-vitess-cluster. The operator's AZ-failure reconciliation behaviour is a direct application of the standard Kubernetes Control Loop (observe → diff → act) extended for the Vitess topology.

Architectural numbers¶

Scale: "hundreds of thousands of databases all over the world", each a full Vitess cluster.
AZ redundancy: minimum 3 AZs per cloud region; regions with fewer than 3 AZs are not supported for paid production databases.
Cloud substrate: "Most databases in PlanetScale operate on either AWS or GCP" (both supported).

Systems, concepts, patterns introduced¶

PlanetScale Vitess Operator — new page, the Kubernetes Operator that manages Vitess clusters on Kubernetes, handles pod/PVC/storage lifecycle, AZ-aware reconciliation.
concepts/kubernetes-operator-pattern — new page, the general Kubernetes Operator design pattern (CRD + custom reconciler extending the Control Loop) that the Vitess Operator instantiates.
concepts/statefulset-for-databases — new page, the Kubernetes-community-recommended default pattern for running stateful workloads on K8s, and the three guarantees (storage, identity, order) it provides.
patterns/custom-operator-over-statefulset — new page, the architectural alternative PlanetScale adopted: custom operator + plain pods + direct cloud PVC, because the stateless proxy (VTGate) already handles the pod-identity problem. Trade-off: engineering effort to build the operator, but deeper control over failover, storage, and lifecycle than StatefulSets alone provide.
patterns/multi-az-vitess-cluster — new page, the specific deployment topology PlanetScale enforces: Vitess cluster (tablets + VTGates + vtctld) spread across minimum 3 AZs per region, with operator-driven AZ-failure reconciliation.

Caveats¶

PlanetScale-specific architecture. The operator-over-StatefulSet choice is viable because VTGate provides routing — a team running vanilla MySQL replication on Kubernetes without a VTGate-equivalent routing layer would still need StatefulSets' stable pod-identity guarantee. The post doesn't claim the operator-over-StatefulSet pattern is universally correct — only that it's what PlanetScale does given their stack.
Conceptual-depth post. The article is pitched at "intro to databases on K8s" level. It does not show the CRD schema, the operator's reconciliation loop internals, specific failover timings, or concrete incident case studies. It's an architectural sketch, not an implementation deep-dive.
No numbers on operator-level reconciliation performance. No data on how long AZ-outage detection takes, how long reconcile-to-healthy takes, or any MTTR data.
Cloud-managed-service-specific. The auto-expanding storage approach relies on cloud-provider APIs (AWS EBS, GCP Persistent Disk). Running on bare-metal or non-cloud K8s would not have this affordance out of the box.
Dated 2023-09-27. Pre-dates the PlanetScale Metal launch (2025-03-11, sources/2026-04-21-planetscale-planetscale-metal-theres-no-replacement-for-displacement) which introduced direct-attached NVMe — a different storage architecture than the cloud-PVC model described here.