CONCEPT Cited by 1 source
Control-plane impact without data-plane impact¶
Definition¶
Control-plane impact without data-plane impact is the specific operational outcome-shape in which a system's configuration / orchestration path is fully broken but the data path — the thing customers actually depend on at request time — keeps serving traffic with no degradation. It is the operational proof-of-success for control plane / data plane separation as a design principle: the design worked only to the extent that the control plane can fail without pulling the data plane down with it.
Canonical example: PlanetScale, 2025-10-20¶
Phase 1 of the 2025-10-20 AWS us-east-1 incident took down PlanetScale's control plane (branch creation / resize / config via an internal secret-distribution service → S3 → STS → DynamoDB) for ~2 h 17 min. Verbatim:
Throughout this period, no database branches lost capacity or connectivity.
Three reasons this was possible:
- The control plane was not on the request path. Running MySQL and Postgres primaries, VTTablet sidecars, and VTGate routers served queries from their already-loaded state; nothing in the hot path called back into the branch-creation or secret- distribution service.
- Credentials and routing state were already cached. Whatever the data plane had read from S3 or resolved via STS before the outage, it continued to use. The outage blocked new reads, not lookup of already-cached material.
- The unavailable operations were all "create / change" verbs. New database branches, resizes, config changes. These queued until the control plane recovered.
(Source: sources/2025-11-03-planetscale-aws-us-east-1-incident-2025-10-20)
The test every control-plane-separated system eventually takes¶
A system that claims control-plane / data-plane separation hasn't proved it until the control plane actually goes dark. Common failure modes of the claim:
- The data plane phones home to the control plane on the hot path (e.g. every request fetches a config snapshot without caching it locally). Any control-plane outage becomes a data-plane outage.
- The data plane's local cache expires during the outage and the refresh path hits the dead control plane. The outage is tolerated up to the TTL, then blows up.
- The control plane holds a lease the data plane needs to keep writing. Lease expiry during the outage causes the data plane to fence itself.
The PlanetScale incident is a successful test because none of those three failure modes triggered — running databases kept serving with no lease expiry, no stale-config blowup, no hot-path control- plane call.
What this outcome-shape does not mean¶
- Not every customer-facing function survived. The 2025-10-20 post-mortem is explicit that the dashboard was intermittent, SSO logins were broken for anyone not already logged in, and the status page was itself unavailable for >30 minutes. These are control-plane / auxiliary-plane concerns. "Data-plane-unaffected" means database queries kept working — the minimum viable contract.
- Not every incident splits cleanly. Phase 2 of the same incident (EC2-launch failures + partial network partitions) did affect some customer databases. The phase-1 shape — control plane down, data plane up — is the clean case; phase-2 was the messy case where the data plane itself was hit.
- "Not affected" is not "fully static." Normal auto-failover, replication, and health-check plumbing were still running on the data plane during phase 1; they just didn't depend on the broken control-plane chain.
Related shape: control plane survives, data plane degraded¶
The opposite shape — control plane OK, data plane degraded — is
roughly what a typical single-shard primary failure looks like:
the API still responds to CREATE BRANCH, but reads/writes against
the affected shard fail. Most on-call runbooks are written for
this shape, not the Oct-20 shape, because it's vastly more common.
The Oct-20 shape is rare specifically because it requires a
dependency chain to break upstream of both the data plane and the
control plane, with the data plane's caches holding.
Implications for design reviews¶
When reviewing a system's claim to control-plane / data-plane separation, ask:
- List every IPC the data plane makes on the request path. Does any of them terminate in the control plane? If yes, the separation is nominal.
- List every control-plane-provided artefact the data plane caches locally (config, certs, routing tables, feature flags). What is the refresh cadence, and what happens on refresh failure — fail-open (keep serving stale), fail-closed (refuse), or crash?
- What leases does the data plane hold, and from where? Control- plane-issued leases mean control-plane outages are time-bombs.
- Does disaster-recovery tooling (backups, restores, snapshots) live on the same dependency surface as the control plane, or is it independently path-ed? In the PlanetScale incident, backup was entangled because it needed to launch a new EC2 — a capacity operation, not purely a data-plane read.
Related¶
- concepts/control-plane-data-plane-separation — the design principle of which this is the successful-outcome shape.
- concepts/blast-radius — what separation bounds.
- concepts/runtime-dependency-on-saas-provider — the companion concept for what caused the 2025-10-20 control-plane outage in the first place.
- concepts/ec2-launch-failure-mode — the phase-2 variant where data plane was partially affected.
- sources/2025-11-03-planetscale-aws-us-east-1-incident-2025-10-20 — canonical source.