CONCEPT Cited by 1 source
Database availability attainment¶
Definition¶
Database availability attainment is the measurement substrate that tracks reliability per individual database rather than as a fleet aggregate, and reports on the percentage of databases in the fleet that met a given availability bar (e.g. 99.95%, 99.99%) during a specified window (typically monthly).
The metric shape is two-dimensional:
- Per-database availability — for each database in the fleet, compute % of time it was available during the window. This produces a distribution, not a scalar.
- Attainment — % of databases in the fleet whose computed per-database availability ≥ a target bar.
The aggregate that operators report is "X% of databases met the 99.99% bar this month", not "the fleet was 99.99% available this month".
Why attainment ≠ fleet availability¶
Verbatim framing from the systems/lakebase reliability roadmap (Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
"Database Availability: How many percent of the time every individual database is available. We don't just measure aggregate fleet availability, because an individual customer doesn't care if the fleet had great availability if their database was down."
"Our goal is for every database to exceed 99.99% availability every month. We measure how close we are to that goal with attainment: How many % of the fleet's databases that met the goal."
Fleet-aggregate availability under-represents tail customer impact: if 1% of databases are 0% available for the whole month and 99% are 100% available, fleet-aggregate availability is 99% — but 1% of customers had a complete outage. Attainment makes the tail visible: "99% of databases met the 100% bar; 1% missed it" says the same thing without averaging away the bad tail.
The metric is the multi-tenant-database-fleet analogue of operation-based SLO (which fixes the per-service vs per-user-journey blindness in microservice SLOs): both replace an aggregate with a per-customer metric to make tail impact visible.
Two-tier attainment bar¶
Lakebase reports attainment against two bars (verbatim, Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
- 99.95% — the "acceptable" bar.
- 99.99% — the "goal" bar.
The two-bar structure provides a calibrated view of fleet health: the 99.95% number tracks the long-tail (how many databases missed even the lower bar?); the 99.99% number tracks the goal (how many hit the design target?).
Disclosed Lakebase / Neon 2026 H1 attainment¶
| Month | Met 99.95% | Met 99.99% |
|---|---|---|
| 2026-01 | 99.96% | 99.85% |
| 2026-02 | 99.95% | 99.84% |
| 2026-03 | 99.96% | 99.81% |
| 2026-04 | 99.93% | 99.75% |
(Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures)
Two observations:
- Attainment-99.95% is on the order of 99.95% — i.e. the bar itself is very close to the achieved number, indicating ~1 in 2,000 databases missed the bar.
- Attainment-99.99% is closer to 99.8%, indicating ~1 in 500 databases missed the goal in any given month.
- April dipped on both bars (99.96 → 99.93; 99.81 → 99.75). The source does not explain the dip; ingestion has not surfaced an incident-correlation.
How attainment composes with SLI menus¶
Attainment-of-availability is one of multiple per-database SLI attainments Lakebase tracks (verbatim list, Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
- Database availability (this concept).
- Database startup time — how quickly a suspended database becomes available; the serverless-specific SLI.
- Database switchover/failover frequency + latency.
- Storage availability + latency on page reads + durable writes — see concepts/storage-io-latency-sli.
- Control Plane APIs success rates + latencies — see concepts/control-plane-as-the-new-data-plane for the workload-shape forcing function.
Attainment as an aggregation primitive applies uniformly to each: "X% of databases hit the startup-time goal this month", etc.
Operational implications¶
- Detect localized cell-level outages. A monolithic fleet-aggregate metric averages cell-level outages into noise; attainment surfaces them as a "3% of databases missed the bar this month" dip — readable as "one cell in a 30-cell fleet had a bad day". Composes naturally with cell-based architecture.
- Track tail-customer SLA risk. Customers with strict SLAs care about attainment percentile, not fleet-mean.
- Drive engineering priority. A design change that helps the fleet-mean but not the tail is invisible on attainment; one that reduces the bottom-1%-of-databases outage rate moves the attainment number directly.
Measurement gotchas¶
- Window choice. Monthly attainment can hide a single bad week. Some operators report rolling-7-day + monthly + quarterly simultaneously.
- Per-database aggregation. Rolling-up an attainment across multiple databases per customer is a separate question; a customer with 100 databases at 99.99% has different SLA implications than one with 1 database at 99.99%.
- Measurement window vs database age. Newly-created databases shouldn't artificially spike attainment numbers if measured for partial months; a "qualifying lifetime" threshold may apply. The Lakebase post does not detail this.
- What counts as "available"? A database that's responding to connections but with high error rate is partially-available; attainment requires a precise availability definition (typically a binary success-rate threshold per minute, then % of minutes available).
- Fleet composition matters. If most databases are agent-created / scale-to-zero / used briefly, their availability numbers may cluster differently from long-running customer databases.
Related to public-status pages¶
The Neon public status surface is referenced as https://neonstatus.com/ (verbatim from the source: "The user-visible status you see at https://neonstatus.com/ is a high-level view"). The internal attainment numbers are explicitly higher-resolution than the public status page — a useful framing of public-status as customer-facing and attainment as engineering-internal.
Seen in¶
- sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures — canonical wiki framing. The four-month 2026 H1 attainment table. The two-bar (99.95% / 99.99%) reporting shape. The motivating "individual customer doesn't care if the fleet had great availability" framing.
Related¶
- concepts/operation-based-slo — sibling per-customer-metric shape at the user-journey altitude (vs per-database here)
- concepts/storage-io-latency-sli — sibling per-component SLI; attainment shape composes with it
- concepts/database-startup-time-sli — sibling serverless-specific SLI
- concepts/control-plane-as-the-new-data-plane — the workload-shape forcing function that makes startup-time-attainment load-bearing
- systems/lakebase / systems/neon — canonical instances
- patterns/per-database-availability-attainment — operational pattern for the measurement