CONCEPT Cited by 1 source
Critical-path dependency minimization¶
Definition¶
Critical-path dependency minimization is the reliability
discipline of reducing the count and depth of external
dependencies on the synchronous request path of a load-bearing
operation. In a chain of N dependencies each at availability
A, the operation's effective availability is A^N — a
direct application of
availability
multiplication. Removing a dependency from the critical path is
mathematically equivalent to making it 100% available; the
practical implication is that the fewer external services on the
hot path, the higher the achievable availability ceiling.
The discipline is most valuable when:
- The operation is on a synchronous user-visible path (its outage is observed as customer impact, not as queue depth).
- The dependencies are themselves complex services with their own reliability budgets (cloud-provider control planes, Kubernetes system services, IAM, DNS).
- The operation runs at high frequency (the dependency chain's composite outage rate is multiplied by the operation rate to get customer-impact rate).
Canonical Lakebase framing¶
Verbatim from the systems/lakebase reliability roadmap (Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
"Serving agentic workloads means creating and resuming databases must be highly reliable. Reliability is strongly correlated with the dependency chain and the amount of machinery involved in the flow. In a traditional setup with Postgres in cloud provider VMs, this goes well beyond the data plane:
- Cloud provider's compute control plane to provision VMs
- Available VM capacity (where the cloud provider controls the policy of who gets it)
- Cloud provider's block store control plane to provision local storage
- Cloud provider's networking control plane to allocate IPs, configure firewalls and network routes to the new VM
- If using Kubernetes (K8s) - an additional dependency on the K8s system services."
That's a 5-link chain on the cold-start critical path of a single Postgres database — and each link is itself a multi-component service with its own outage budget and capacity policies.
The architectural reply¶
The Lakebase architectural answer is to collapse the chain by pre-completing the dependencies rather than calling them on the hot path (verbatim, Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
"In Lakebase, we take a different approach that drastically reduces the amount of control plane machinery involved in critical database flows:
- We allocate a pool of big (often bare metal) instances from the cloud provider. We carry buffers to sustain cloud provider provisioning outages.
- We built our own vertically autoscaling virtualization layer that schedules multiple Postgres instances onto those cloud instances.
- We don't rely on cloud block store devices, but instead store data in our own zone-resilient storage that is ultimately backed in object stores like S3 or Azure Blob storage."
The structural transformations:
| Original critical-path dependency | Lakebase replacement |
|---|---|
| Cloud-provider compute control plane | Pre-allocated bare-metal pool with provisioning buffer |
| Cloud-provider VM capacity | Buffered headroom in already-allocated pool |
| Cloud-provider block-store control plane | In-house zone-resilient storage on object stores |
| Cloud-provider networking control plane | Fewer per-Postgres IPs to allocate (multiple Postgres on one host) |
| Kubernetes system services for per-Postgres scheduling | In-house vertical-autoscaling virtualization layer |
Each replacement either moves the dependency off the hot path (pre-completion) or replaces a complex external service with a purpose-built simpler one tuned for this specific workload.
The buffer-of-bare-metal-instances primitive¶
The pool-with-buffer primitive is the statically stable realisation of critical-path dependency minimisation: keep enough headroom that the cloud-provider control plane is not on the critical path of any individual Postgres start.
Three properties make it statically stable:
- Buffer size > expected provisioning-outage duration. Sized so that even during a cloud-provider compute-control-plane outage, Lakebase can keep starting Postgres instances from already-allocated headroom for the duration of the outage.
- Buffer is replenished off the hot path. Replenishment uses the cloud-provider control plane but does not block any user request — at worst it depletes faster than usual until the cloud provider recovers.
- Buffer is shared across customers. Per-customer pre-allocation would scale linearly with tenancy; shared buffer amortises across the fleet.
See patterns/preallocated-bare-metal-pool-with-virtualization for the operational pattern.
When this discipline is load-bearing¶
The discipline applies to any operation on a critical path with a multi-dependency chain. It is load-bearing specifically when:
- The operation has agentic / on-demand / scale-to-zero shape — see concepts/control-plane-as-the-new-data-plane for the workload-shape forcing function.
- The dependency-chain availability multiplier exceeds the operation's target availability budget. (E.g. five 99.99% dependencies = 99.95% composite, which alone consumes the entire 99.95% target SLO.)
- The dependencies have correlated failure modes — a cloud-provider control-plane regional outage takes out compute / block / network control planes simultaneously. See concepts/blast-radius for the correlation framing.
Generalises beyond databases¶
The same pattern shape recurs whenever a service runs on a cloud provider but cannot afford the full cloud-provider-control-plane dependency chain on the request path:
- Container platforms that pre-allocate node pools with capacity buffer, then schedule containers onto them locally — same shape, different verticalisation.
- Serverless function platforms (AWS Lambda) — pre-warm executor pools, fast in-process container start.
- Bursty-workload databases generally — see Lakebase / Neon / PlanetScale for database-tier instances.
Caveats¶
- Not free. Bare-metal pool + buffer + virtualisation layer are capital-intensive; they pay off when fleet scale + reliability target both demand it. A small fleet on a fat dependency chain may be better off accepting the chain.
- Buffer sizing is a calibration problem. Too small → cloud- provider outages bleed through; too large → wasted capacity. Sizing depends on assumed outage-duration distribution and provisioning- request rate.
- In-house virtualisation tax. The vertical-autoscaling-virtualisation-layer is engineering effort that cloud-provider VMs would have provided for free. Justified only by the reliability + density payoff.
- Off-hot-path is not no-path. The dependency chain is still on the replenishment path; a sustained cloud-provider outage will eventually deplete the buffer.
Seen in¶
- sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures — canonical wiki framing on serverless-Postgres-startup. The five-link cloud-provider-control-plane chain enumerated verbatim; the bare-metal-pool-plus-virtualisation-layer architectural reply enumerated verbatim.
Related¶
- concepts/availability-multiplication-of-dependencies — the mathematical framing of why dependency-chain length matters
- concepts/control-plane-as-the-new-data-plane — the workload-shape forcing function that makes start-path reliability load-bearing
- concepts/control-plane-data-plane-separation — the architectural parent
- concepts/static-stability — the buffer-of-pool primitive is a statically-stable realisation
- concepts/scale-to-zero — workload property that puts start on the critical path in the first place
- concepts/availability-dependency — neighbouring concept
- systems/lakebase / systems/neon — canonical instances
- systems/aws-ec2 — the cloud-provider compute primitive being buffered against
- systems/kubernetes — alternative dependency that the pattern also bypasses for the hot path
- patterns/preallocated-bare-metal-pool-with-virtualization — the operational pattern