Skip to content

PATTERN Cited by 1 source

Custom operator over StatefulSet

Problem

Running a stateful workload (database, broker, stream processor) on Kubernetes normally uses a StatefulSet — the Kubernetes-documented default for stateful workloads. StatefulSets provide three guarantees: stable pod identity (name + address), persistent storage per pod via PVC, and ordered startup/shutdown.

But those guarantees are generic. A database cluster has domain-specific operational concerns — replication + failover, source-of-truth determination, backup/restore, topology invariants, AZ-failure reconciliation — that StatefulSets don't address. Teams building a database-as-a-service have to layer those concerns somewhere, and once they have the logic written as a Kubernetes Operator, the StatefulSet's specific guarantees start looking redundant or even restrictive.

Solution

Build a custom Kubernetes Operator that replaces StatefulSet with plain pods + direct PVC. Use the stack's existing components to substitute for what StatefulSets would provide:

StatefulSet guarantee Substitute in the stack
Stable pod name + network address A stateless proxy layer with service discovery (VTGate + topo-server in the Vitess case)
Persistent storage per pod PVC bound directly to cloud block storage, lifecycle managed by operator
Ordered startup / shutdown Operator's reconcile logic codifies the domain-specific startup order (e.g. primary first, then replicas)

The operator handles the rest: failover, topology invariants, backup workflows, auto-expanding storage, multi-AZ placement. See PlanetScale Vitess Operator for the canonical example.

When this pattern fits

Fits well when:

  • You have a stateless proxy or service-discovery layer in front of the database pods. The proxy abstracts pod identity away from applications — applications address the proxy, not specific pods. VTGate is the canonical example.
  • You already need a custom operator for other reasons (failover, topology management, backup orchestration). Once the operator exists, StatefulSet becomes additional machinery to manage with limited incremental benefit.
  • You run many clusters (one per tenant) — the engineering cost of the operator amortises across thousands of deployments.
  • You want finer control over pod lifecycle than StatefulSet's ordering semantics provide (e.g. parallel replica startup after primary is up, or custom pre-stop hooks that drain connections before pod termination).

Does NOT fit when:

  • You run vanilla MySQL / Postgres replication with applications connecting directly to database pods. Without a proxy layer, applications need stable pod names — StatefulSet is the right tool.
  • You run few clusters and can't justify operator engineering cost.
  • You want to use off-the-shelf Helm charts that expect StatefulSet underneath.

Why PlanetScale adopted this pattern

From Brian Morrison II:

"The recommended best practice is to use StatefulSets to run databases since the state is automatically tracked by Kubernetes. We actually don't do this and opt instead to use the logic built into the Vitess Operator to spin up pods that attach directly to cloud storage using a persistent volume claim (PVC). Because we already have a routing mechanism in place (VTGate), we don't need to be concerned about the name or address of a given pod." (Source: sources/2026-04-21-planetscale-scaling-hundreds-of-thousands-of-database-clusters-on-kubernetes)

The key insight: VTGate already does the pod- addressing work StatefulSets would provide. Applications connect to VTGate (via the edge LB), VTGate consults the topo-server to route queries to the right tablet. Applications never address tablet pods by name. So the StatefulSet guarantee of stable pod names is providing something nobody needs.

Once that guarantee is not load-bearing, StatefulSets add complexity (e.g. ordered startup semantics that don't match Vitess's primary-first-then-replicas order) without commensurate benefit. Plain pods + PVC + operator-managed lifecycle is leaner.

Trade-offs

  • Engineering cost upfront — building an operator that handles storage + failover + topology + backups + AZ-failure reconciliation is substantial engineering effort. StatefulSets get you some of this for free.
  • Lose out-of-the-box K8s tooling — tools that inspect StatefulSets (Helm charts, dashboards, upgrade operators) don't apply. You're responsible for the equivalent tooling on your CRD.
  • Gain domain-specific control — the operator encodes your exact failover, placement, and lifecycle policies. StatefulSets force you into their generic model.
  • Operator becomes a SPOF if bugs accumulate — a broken reconcile loop can break your entire fleet. Mature operator engineering practices (testing, staged rollout, kill switches) are mandatory.

Alternative approaches

  • StatefulSet + sidecar automation. Use StatefulSets but add external controllers (Redis Operator, KubeDB, etc.) that manage replication + failover on top of the StatefulSet. Gets you K8s-native storage and pod identity, with operator logic layered on. Many open-source database operators use this shape.
  • StatefulSet only. Works for simple setups — single primary + replicas with manual failover. Doesn't scale to fleet-size operational complexity.
  • Custom operator + StatefulSet. The operator manages the CRD but delegates to StatefulSets for pod/storage lifecycle. Combines operator flexibility with StatefulSet integration. Shape adopted by many K8s- native databases.

Seen in

Last updated · 470 distilled / 1,213 read