CONCEPT Cited by 1 source
Database startup time SLI¶
Definition¶
Database startup time is a Service Level Indicator (SLI) specific to serverless / scale-to-zero database services: how long it takes a suspended database to become available to serve queries on connection arrival, and how long it takes a brand-new database to come up from creation.
The SLI does not exist in the traditional always-on database SLO menu — a monolithic Postgres is up or it is in an outage. Under scale-to-zero the database is deliberately down between connections, and the time to come back up is on the request path of every cold connection.
The Lakebase framing¶
Verbatim from the systems/lakebase reliability roadmap (Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
"Database Startup Time: How quickly a suspended database becomes available when you connect, or how quickly a brand new database is starts up."
Two flavours of the SLI surface:
- Auto-resume startup time — time from incoming connection on a suspended database to the database serving the first query. Hot path under control plane is the new data plane — every auto-resume hits this.
- New-database startup time — time from creation API call to the database accepting connections. Hot path under agentic workloads where databases are programmatically created (see concepts/agent-provisioned-database). The 2026-04-29 Stripe-Projects ingest disclosed <350ms as the agent-provisioning latency for Lakebase / Neon through Stripe Projects — first concrete operational datum.
Why it matters under agentic workloads¶
The empirical signal that makes startup-time load-bearing: in Neon, 90% of compute sessions for auto-suspending databases are less than 10 minutes (Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures).
Implications:
- Startup time is on the request-path of the median connection, not the rare event. If a session is 10 minutes long and there's a 100 ms startup penalty, that's 0.16% latency tax on every connection arrival.
- Tail-startup-time matters for tail-customer-experience. A p99 startup time of 30 s during a cloud-provider control-plane hiccup is a 30 s outage from the customer's perspective on every cold start.
- Startup-time variance correlates with capacity-headroom health. When the pre-allocated bare-metal pool runs hot or has to wait for cloud-provider replenishment, startup time stretches; the SLI is a leading indicator of capacity exhaustion.
Composability with attainment¶
Like database availability attainment, startup-time can be measured in attainment-shape: "X% of databases met the startup-time goal this month". Per the source's SLI menu, attainment-of-startup-time is a peer SLI to attainment-of-availability — both are measured per-database and aggregated across the fleet.
Architectural enablers¶
Lakebase's path to a fast startup-time SLI is structural:
- Stateless Postgres compute — no crash recovery, no WAL replay; new compute attaches to existing storage and starts serving immediately.
- Zone-redundant storage — the durable bytes are already there; nothing to load.
- Pre-allocated bare-metal pool — the host VM is already up; only the in-host Postgres start is on the critical path.
- In-house vertical-autoscaling virtualisation layer — schedules multiple Postgres instances onto pool hosts; per-Postgres start avoids cloud-provider VM provisioning.
SLI measurement gotchas¶
- Connection-pooler effects. With a pgbouncer-style pooler in front, the first connection to a suspended database measures the full cold start; subsequent connections within the pooler's TTL see no cold start. Reporting must disambiguate.
- What counts as "available"? Accepting TCP, accepting TLS, accepting authentication, executing a no-op SELECT, executing a SELECT against a hot table — each is a different latency milestone.
- Cold vs warm cache. Storage-tier NVMe SSD caches may not be warm immediately on cold start; first-query latency may exceed startup-time. Some operators define a "first business query" metric in addition to "first SELECT 1".
- Aggregation method. Mean / median / p95 / p99 / max of startup-time; p99 is the customer-visible tail.
Caveats¶
- No specific Lakebase startup-time SLO disclosed. The reliability post names the SLI but does not publish the target. The Stripe Projects ingest separately disclosed <350 ms as the agent-provisioning latency for new databases — but that's a product disclosure, not the per-database SLI's target.
- Source post does not detail measurement methodology. Whether startup-time is measured at the proxy / control-plane / Postgres, whether p50 / p99 / max is the published number, and whether the SLI is per-region or fleet-wide — none are described.
Seen in¶
- sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures — canonical wiki framing as one of five disclosed Lakebase SLIs. The two-flavour framing (auto-resume + brand-new) is from the source.
Related¶
- concepts/control-plane-as-the-new-data-plane — workload-shape forcing function that makes startup-time load-bearing
- concepts/scale-to-zero — the database property that makes cold-start the default
- concepts/database-availability-attainment — sibling SLI; attainment shape composes
- concepts/stateless-compute — architectural enabler for fast startup
- concepts/critical-path-dependency-minimization — companion discipline; reducing the start-path's dependency chain reduces startup-time tail
- concepts/operation-based-slo — sibling per-customer-metric shape at the user-journey altitude
- systems/lakebase / systems/neon — canonical instances
- patterns/per-database-availability-attainment — attainment-shape SLO measurement pattern