PATTERN Cited by 1 source
Pre-allocated bare-metal pool with vertical-autoscaling virtualization¶
The pattern¶
Pre-provision a pool of large (often bare-metal) cloud instances with provisioning headroom, then run a purpose-built vertical-autoscaling virtualisation layer that schedules the service's per-tenant compute (e.g. one Postgres database compute per tenant) onto the pool. Customer-tenant compute is started by the in-house virtualisation layer, bypassing the cloud-provider VM control plane on the per-tenant hot path.
The pattern is a concrete realisation of critical-path dependency minimisation applied to the start verb under agentic / scale-to-zero workloads — see concepts/control-plane-as-the-new-data-plane for the workload-shape forcing function.
The three components¶
- Bare-metal pool with provisioning buffer. Allocate fewer, larger instances rather than many small ones. Carry capacity buffer beyond steady-state demand to absorb cloud-provider provisioning outages — "We carry buffers to sustain cloud provider provisioning outages." (Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures)
- Vertical-autoscaling virtualisation layer. "Schedules multiple Postgres instances onto those cloud instances"; auto- scales a tenant's allocation up and down based on observed load. Vertical (vs horizontal) because each tenant gets a variable slice of one host's resources — densification rather than spread.
- Own zone-resilient storage substrate. "We don't rely on cloud block store devices, but instead store data in our own zone-resilient storage that is ultimately backed in object stores." See concepts/zone-redundant-storage + systems/pageserver-safekeeper.
The three components together replace five cloud-provider control- plane dependencies (compute / VM-capacity-policy / block / network / Kubernetes-system-services) with a single in-house data path.
Lakebase / Neon canonical instance¶
Verbatim (Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
"In Lakebase, we take a different approach that drastically reduces the amount of control plane machinery involved in critical database flows:
- We allocate a pool of big (often bare metal) instances from the cloud provider. We carry buffers to sustain cloud provider provisioning outages.
- We built our own vertically autoscaling virtualization layer that schedules multiple Postgres instances onto those cloud instances.
- We don't rely on cloud block store devices, but instead store data in our own zone-resilient storage that is ultimately backed in object stores like S3 or Azure Blob storage."
Why this pattern shape vs alternatives¶
| Alternative | Trade-off |
|---|---|
| Cloud-provider per-tenant VMs | Simpler, but every tenant start chains through cloud-provider compute / block / network control planes |
| Kubernetes-managed pods on shared nodes | Adds Kubernetes system-services dependency on the start path |
| Single-tenant Postgres on dedicated VM (always-on) | Forfeits scale-to-zero economics and density |
| Pre-allocated bare-metal pool + in-house virtualisation | Highest reliability + highest density + biggest engineering investment |
The pattern is the right shape when:
- Tenant count >> instance count — densification is needed economically.
- Cold-start frequency is high — start path is on the request path of every connection arrival under scale-to-zero.
- Reliability target is 99.99%+ — five-link cloud-provider control-plane chain alone consumes the budget.
- Engineering capacity exists to build + operate the virtualisation layer — not free; this is the cost of the pattern.
Composability with cell-based architecture¶
The pattern composes naturally with cell-based architecture: each cell carries its own bare-metal pool with its own buffer; cell-level capacity exhaustion is contained to that cell. Cross-cell overflow is a separate design choice (typically not done — cells are independent on purpose).
Static-stability framing¶
The pattern is a statically stable instantiation: the pool with buffer is the "absorb failure without fetching new resources" primitive — when the cloud provider's compute control plane has an outage, the pool already has the instances, and the virtualisation layer can keep starting tenant compute from the buffer without a hot-path call to the cloud-provider control plane.
Caveats¶
- Build cost. Owning a vertical-autoscaling virtualisation layer is non-trivial engineering — kernel-level isolation, fair scheduling across noisy neighbours, density-vs-isolation trade-offs.
- Buffer-sizing is a live calibration problem. Too small bleeds cloud-provider outages through to customers; too large is wasted capex. Sizing depends on assumed outage-duration distribution.
- Failure-mode novelty. A bug in the in-house virtualisation layer is a new failure surface that doesn't exist with cloud- provider VMs. Quality bar must be high.
- Cross-tenant noisy-neighbour. Multiple Postgres on one bare-metal instance must isolate IO, CPU, and memory. The isolation primitives are load-bearing.
- Bare-metal instance churn. When a bare-metal instance fails, N tenants are affected at once vs 1 with per-tenant VMs. The cell-level orchestration must handle this rebalancing efficiently.
- Cloud-provider primitives still on the replenishment path. The pool is replenished via the cloud-provider compute control plane — sustained cloud-provider outages will eventually deplete the buffer.
- Specific virtualisation primitives not detailed in source. The Lakebase post links separately to the Neon autoscaling-architecture docs but does not detail the kernel-level isolation mechanisms, scheduler design, or noisy-neighbour mitigations.
Seen in¶
- sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures — canonical Lakebase / Neon framing. The five-link cloud-provider control-plane chain is enumerated verbatim; the three-component architectural reply is enumerated verbatim. The Neon autoscaling-architecture page is linked as the architecture-detail reference.
Related¶
- concepts/critical-path-dependency-minimization — parent concept; the discipline this pattern operationalises
- concepts/control-plane-as-the-new-data-plane — the workload-shape forcing function
- concepts/static-stability — the buffer-of-pool primitive is a static-stability instantiation
- concepts/availability-multiplication-of-dependencies — the mathematical framing of why dependency-chain length matters
- concepts/database-startup-time-sli — the SLI the pattern optimises for
- systems/lakebase / systems/neon — canonical instances
- systems/aws-ec2 — the cloud-provider compute primitive being buffered against
- patterns/separate-data-plane-controller-for-hot-path — companion pattern; the hot-path controller drives the virtualisation layer