CONCEPT Cited by 1 source
Control plane is the new data plane¶
Definition¶
"Control plane is the new data plane" is the architectural claim that under agentic, serverless, scale-to-zero workloads, the part of the control plane that starts databases (or compute, or any on-demand resource) is structurally on the critical path of every customer request — making its reliability requirements equivalent to a traditional data plane's, not a traditional control plane's.
The reframe inverts the canonical control / data plane split: in the monolithic-cloud-database era the data plane is the critical 99.99%+ availability surface, and the control plane is "only" critical for management operations. Under agentic / scale-to-zero, that distinction collapses for the start-database verb specifically — every auto-resume on incoming connection is a control-plane operation, and "if the agent goes to sleep, why pay for provisioned infrastructure?" makes the cold path the common case rather than the exception.
Canonical framing¶
Verbatim from the systems/lakebase reliability roadmap (Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
"In monolithic cloud database service architecture, the data plane is the critical part of the service. It's designed for 99.99+% availability and static stability. The control plane matters 'only' for management operations. With agentic and on-demand workloads, the part of the control plane that starts databases is effectively the data plane. This has changed how we think about our architecture. Currently, our control plane handles everything from starting databases to billing. The former is clearly more critical. We've had outages where background maintenance operations resource-starved on-demand database startups - that's clearly not ok."
The forcing-function workload shape¶
Three workload trends collide to produce this reframe (Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
- Higher throughput of control-plane operations. Agents programmatically create and manage databases / storage / compute at rates much higher than humans. "In Databricks Lakebase, agents create 4× as many databases as humans do."
- More demand for on-demand. Serverless, autoscaling, and auto-suspend infrastructure is the new default — "if the agent goes to sleep, why pay for provisioned infrastructure?"
- Capacity crunch. "The notion of cloud having 'infinite' capacity is showing cracks" — allocating new cloud capacity won't always succeed.
The empirical signal for the steady-state load on the start path: 90% of compute sessions for auto-suspending databases in Neon are less than 10 minutes. Every auto-resume is a control-plane RPC; the session lifetime is short enough that the start RPC happens substantially more often than once per "workload" — it happens once per idle-then-active window.
The architectural reply¶
The reframe has two structural consequences (verbatim, Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
- **Decompose the control plane into a hot-path data-plane controller
- a cold-path management plane. "We're currently hard at work separating the critical parts of the control plane into a data plane controller service that handles only hot-path operations (start/suspend). This service has less business logic, a strict, minimal set of external dependencies, and is engineered from the ground up with resilience, graceful degradation, and defense-in-depth top of mind." See patterns/separate-data-plane-controller-for-hot-path.
- Aggressively minimise the start-path's external-dependency chain. Provisioning a Postgres compute conventionally chains through cloud-provider compute / block / network control planes plus optionally Kubernetes — each chain link is a 99.99%-tier dependency that multiplies into the start-path's effective availability. Pre-allocate a bare-metal pool with provisioning buffer + own vertical-autoscaling virtualization layer collapses the chain.
Why this is distinct from "control plane / data plane separation"¶
The parent concept control plane / data plane separation frames the split architecturally — different code, different scaling, different blast radius. Within that frame, control plane = "less critical" is the canonical default (PlanetScale's Max Englander frames it explicitly: "the data plane is the most critical plane, with fewer dependencies than the control plane").
This concept canonicalises the inverse-ranking corner case: when the workload is agentic / scale-to-zero, the traditional control plane / data plane labels still apply structurally — "control plane decides, data plane delivers" — but the availability ranking inverts for one specific verb (start). The start verb is on the synchronous request path of a connection arrival, even though it lives on the architectural control plane.
The structural fix is not to merge the planes — that would re-introduce the scaling and isolation problems the split solves. The fix is to carve out a hot-path subset of the control plane with data-plane availability discipline, and leave the rest of the control plane on its traditional cadence.
Operational implication¶
Anything that runs on the same control plane as the start verb is now on the request path of every cold-start. The classic "background maintenance ran in the control plane" shape becomes a failure mode: verbatim "We've had outages where background maintenance operations resource-starved on-demand database startups - that's clearly not ok." The fix is structural separation rather than just better SLA classes — see patterns/separate-data-plane-controller-for-hot-path.
Seen in¶
- sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures — canonical wiki framing. systems/lakebase / systems/neon reliability roadmap. The 90%-of-sessions-under-10-minutes Neon empirical chart is the load-bearing data point. The data-plane controller separation is "currently hard at work" (in flight, not landed) per the post.
Related¶
- concepts/control-plane-data-plane-separation — architectural parent
- concepts/stateless-compute — Postgres compute is stateless, which makes start cheap (no crash recovery / WAL replay) — the architectural enabler for tens-of-millions-of-starts-per-day at reasonable latency
- concepts/scale-to-zero — the property that makes the start path the common path rather than the rare path
- concepts/static-stability — the discipline that the data-plane controller must adopt; buffers, last-known-good state, dependency minimisation
- concepts/critical-path-dependency-minimization — companion concept — once start is on the critical path, its dependency chain becomes load-bearing
- concepts/agent-provisioned-database — agentic-workload primitive that drives this reframing
- systems/lakebase / systems/neon — canonical instances
- patterns/separate-data-plane-controller-for-hot-path — architectural pattern for the decomposition