Skip to content

CONCEPT Cited by 1 source

Database availability attainment

Definition

Database availability attainment is the measurement substrate that tracks reliability per individual database rather than as a fleet aggregate, and reports on the percentage of databases in the fleet that met a given availability bar (e.g. 99.95%, 99.99%) during a specified window (typically monthly).

The metric shape is two-dimensional:

  1. Per-database availability — for each database in the fleet, compute % of time it was available during the window. This produces a distribution, not a scalar.
  2. Attainment — % of databases in the fleet whose computed per-database availability ≥ a target bar.

The aggregate that operators report is "X% of databases met the 99.99% bar this month", not "the fleet was 99.99% available this month".

Why attainment ≠ fleet availability

Verbatim framing from the systems/lakebase reliability roadmap (Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):

"Database Availability: How many percent of the time every individual database is available. We don't just measure aggregate fleet availability, because an individual customer doesn't care if the fleet had great availability if their database was down."

"Our goal is for every database to exceed 99.99% availability every month. We measure how close we are to that goal with attainment: How many % of the fleet's databases that met the goal."

Fleet-aggregate availability under-represents tail customer impact: if 1% of databases are 0% available for the whole month and 99% are 100% available, fleet-aggregate availability is 99% — but 1% of customers had a complete outage. Attainment makes the tail visible: "99% of databases met the 100% bar; 1% missed it" says the same thing without averaging away the bad tail.

The metric is the multi-tenant-database-fleet analogue of operation-based SLO (which fixes the per-service vs per-user-journey blindness in microservice SLOs): both replace an aggregate with a per-customer metric to make tail impact visible.

Two-tier attainment bar

Lakebase reports attainment against two bars (verbatim, Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):

  • 99.95% — the "acceptable" bar.
  • 99.99% — the "goal" bar.

The two-bar structure provides a calibrated view of fleet health: the 99.95% number tracks the long-tail (how many databases missed even the lower bar?); the 99.99% number tracks the goal (how many hit the design target?).

Disclosed Lakebase / Neon 2026 H1 attainment

Month Met 99.95% Met 99.99%
2026-01 99.96% 99.85%
2026-02 99.95% 99.84%
2026-03 99.96% 99.81%
2026-04 99.93% 99.75%

(Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures)

Two observations:

  • Attainment-99.95% is on the order of 99.95% — i.e. the bar itself is very close to the achieved number, indicating ~1 in 2,000 databases missed the bar.
  • Attainment-99.99% is closer to 99.8%, indicating ~1 in 500 databases missed the goal in any given month.
  • April dipped on both bars (99.96 → 99.93; 99.81 → 99.75). The source does not explain the dip; ingestion has not surfaced an incident-correlation.

How attainment composes with SLI menus

Attainment-of-availability is one of multiple per-database SLI attainments Lakebase tracks (verbatim list, Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):

Attainment as an aggregation primitive applies uniformly to each: "X% of databases hit the startup-time goal this month", etc.

Operational implications

  • Detect localized cell-level outages. A monolithic fleet-aggregate metric averages cell-level outages into noise; attainment surfaces them as a "3% of databases missed the bar this month" dip — readable as "one cell in a 30-cell fleet had a bad day". Composes naturally with cell-based architecture.
  • Track tail-customer SLA risk. Customers with strict SLAs care about attainment percentile, not fleet-mean.
  • Drive engineering priority. A design change that helps the fleet-mean but not the tail is invisible on attainment; one that reduces the bottom-1%-of-databases outage rate moves the attainment number directly.

Measurement gotchas

  • Window choice. Monthly attainment can hide a single bad week. Some operators report rolling-7-day + monthly + quarterly simultaneously.
  • Per-database aggregation. Rolling-up an attainment across multiple databases per customer is a separate question; a customer with 100 databases at 99.99% has different SLA implications than one with 1 database at 99.99%.
  • Measurement window vs database age. Newly-created databases shouldn't artificially spike attainment numbers if measured for partial months; a "qualifying lifetime" threshold may apply. The Lakebase post does not detail this.
  • What counts as "available"? A database that's responding to connections but with high error rate is partially-available; attainment requires a precise availability definition (typically a binary success-rate threshold per minute, then % of minutes available).
  • Fleet composition matters. If most databases are agent-created / scale-to-zero / used briefly, their availability numbers may cluster differently from long-running customer databases.

The Neon public status surface is referenced as https://neonstatus.com/ (verbatim from the source: "The user-visible status you see at https://neonstatus.com/ is a high-level view"). The internal attainment numbers are explicitly higher-resolution than the public status page — a useful framing of public-status as customer-facing and attainment as engineering-internal.

Seen in

Last updated · 542 distilled / 1,571 read