Skip to content

CONCEPT Cited by 1 source

StatefulSet for databases

Definition

A StatefulSet is the Kubernetes workload resource designed for stateful applications — applications whose identity and persistent data must survive pod restarts and reschedules. It is the Kubernetes-documented recommended approach for running databases on Kubernetes.

"According to the official docs, StatefulSets are the recommended way to run a database on Kubernetes." (Source: sources/2026-04-21-planetscale-scaling-hundreds-of-thousands-of-database-clusters-on-kubernetes)

What a StatefulSet provides

StatefulSets offer three guarantees that databases need and that regular Deployments do not provide:

  1. Persistent storage per pod — each pod gets its own Persistent Volume Claim (PVC), attached to a durable volume (cloud block storage, NFS, local disk). The volume survives pod restarts and reschedules.
  2. Stable pod name + network address — pods are named <set-name>-0, <set-name>-1, etc., and keep those names across restarts. DNS records are stable too. Applications can rely on addressing a specific pod by name.
  3. Ordered startup / shutdown — pods come up in order (pod-0 before pod-1 before pod-2) and shut down in reverse order. This matters for primary-replica topologies where the primary must be up before replicas can replicate.

From Morrison II:

"StatefulSets in Kubernetes allow you to define a set of pods that maintain the state of the data within a pod regardless of its online status. Kubernetes does this by attaching persistent storage to the pod for it to read and write data to, as well as ensuring that when pods come online, they do so in the same order, with the same name and network address every time."

Contrast with Deployment

A Deployment "aims to keep a specific number of a given pod online but doesn't care in what order they come up or what their names are." For a stateless web service this is exactly what you want — pods are fungible. For a database, fungibility breaks down: pod-0 might be the primary, pod-1 and pod-2 replicas, and swapping their identities silently would corrupt replication.

Why databases on K8s need StatefulSet-like guarantees

The root problem:

"If a container within a pod crashes, any data being stored within the container is essentially lost. With databases, the state of the data within the database is kind of important, which is why special considerations need to be taken when deploying the database to Kubernetes."

Container-local storage is ephemeral. Container-crash recovery in Kubernetes is: kill the pod, start a new one — and any data written to local filesystem inside the old container is gone. For stateless apps this is fine. For databases it's catastrophic.

StatefulSet + PVC breaks this coupling: the pod can die and be rescheduled, but the PVC (and the underlying volume) persists, and the new pod attaches to the same volume. Data survives.

What StatefulSets don't solve

StatefulSets give you storage + pod-identity + ordering. They don't give you the domain-specific operational concerns of a real database cluster, which Morrison II enumerates:

  1. Replication + node-failure handling"what happens when a node experiences an outage?" Who decides the failover, who promotes the new primary, how is split-brain prevented?
  2. Source-of-truth determination"how do you know which of the pods has the most up-to-date data, and how should your application determine which pod to query from?" An application connecting to mysql-0 doesn't know if mysql-0 is still the primary or has been demoted.
  3. Backups and restores"where do the backups come from, how do you know they are complete, and how should you restore them?" StatefulSets don't model backup state at all.

These are the gaps that a Kubernetes Operator fills by encoding domain-specific logic (see PlanetScale Vitess Operator as the canonical example).

Alternative: operator + plain pods + PVC

Teams that build a Kubernetes Operator for their database can sometimes skip StatefulSets entirely and use plain pods with direct PVC attachment — if their stack already provides the identity and ordering guarantees from a different layer. PlanetScale does exactly this (patterns/custom-operator-over-statefulset): VTGate handles routing, so pod names don't need to be stable from the application's perspective; the operator handles startup order via its reconcile logic. StatefulSet becomes redundant.

This trade-off only works when there's a proxy layer (like VTGate) or similar service-discovery abstraction in front of the database pods. A team running vanilla MySQL replication without such a layer should still use StatefulSets.

Last updated · 470 distilled / 1,213 read