CONCEPT Cited by 1 source
Zone-redundant storage¶
Definition¶
Zone-redundant storage is durable storage whose data is automatically replicated across multiple availability zones at the storage-service layer — so a single AZ failure does not lose durability and (typically) does not lose availability. The canonical realisation is object storage like S3 / Azure Blob Storage in their default configurations; databases that build on zone-redundant object storage inherit the property.
The architectural significance for databases is that zone-redundancy at the storage layer eliminates the hot-standby-tax — a stateful primary doesn't need a full data copy on a hot-standby instance in another AZ to survive a single-AZ failure, because the durable bytes already live across multiple AZs in the storage tier.
The hot-standby tax it eliminates¶
Verbatim from the systems/lakebase reliability roadmap (Source: sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures):
"Monolithic Postgres setups are usually backed by local block devices that are rarely zone-redundant. This necessitates physical replication and costly hot standby replicas across multiple availability zones. In Lakebase and Neon, all databases, regardless of tier and configuration, are backed by distributed, zone-redundant, highly available storage. Data is stored in highly durable, zone-redundant object storage, and performance is accelerated by NVMe SSD caches across multiple availability zones at no additional cost to you."
The cost-shape comparison:
| Substrate | AZ-loss survival mechanism | Cost shape |
|---|---|---|
| Local block device (EBS-style) | Replicate data to a hot-standby compute in another AZ | 2× full-data-copy + 2× compute |
| Zone-redundant object storage | Built into the storage layer | 1× compute, storage cost amortised across customers |
The economic delta lets every database — not just an HA-tier opt-in — get AZ-loss-survival for free at the storage layer. This is the default-on multi-AZ durability primitive that systems/lakebase / systems/neon use, and the architectural forcing function for "all databases, regardless of tier and configuration" having distributed-zone-redundant storage.
Two-tier zone-redundancy structure¶
In the Pageserver+Safekeeper tier:
- Cold-tier durability: object storage (S3 / Azure Blob Storage) — provides the bytes-in-multiple-AZs durability guarantee.
- Hot-tier latency: NVMe SSD caches "across multiple availability zones" — accelerate hot reads. The cache layer is itself multi-AZ so cache misses degrade gracefully on AZ failure rather than producing a hot-cache cliff.
The combination is the "distributed, zone-redundant, highly available storage" substrate disclosed in the reliability roadmap. It composes with stateless Postgres compute to produce the single-compute-Postgres-with- significantly-improved-availability result without paying for the hot-standby compute replica.
Composability with HA tier¶
For databases that need maximum availability, Lakebase additionally provisions "dedicated computes across multiple availability zones for your database, ensuring that your database remains available even if the cloud provider runs out of capacity during (or as a result of) the failure event. These computes can additionally be utilized to scale reads." Zone-redundant storage is the substrate under both single-compute and multi-compute tiers; the multi-compute tier adds compute redundancy on top of the always-zone-redundant storage.
Why it is the right default¶
- Storage-layer redundancy is amortisable across customers. The cost of cross-AZ replication at the object-storage layer is spread across tenancy; per-customer it is ~free. Compute-layer hot-standby is per-customer.
- Storage-layer redundancy outlives compute lifetimes. When the Postgres compute is auto-suspending (90% of sessions <10 min per Neon's data), there is no compute to host a hot standby anyway — storage-layer redundancy is the only AZ-survival mechanism that works under scale-to-zero.
- Storage-layer redundancy is the same primitive whether the DB is single-compute or HA-tier. Tier-up adds compute redundancy; storage redundancy is universal.
Caveats¶
- AZ-redundant ≠ region-redundant. Single-region cross-AZ replication does not survive a regional outage. Multi-region durability requires a separate substrate (cross-region object replication, multi-region database tier).
- Storage-layer redundancy ≠ zero RTO. The compute layer still has to fail over to another AZ on AZ loss; zone-redundant storage removes the data-side RTO (no replication catch-up), not the compute-side restart and reroute. Lakebase's "30-second-or-better" whole-AZ goal is the compute-side recovery target with the storage-side already at zero RTO.
- Object-storage latency floor. Page reads from object storage are slower than from local SSD; the NVMe SSD cache layer is load-bearing for OLTP latency. A cold cache after AZ failover hits the object-storage floor until warm.
- Specific replication policy not detailed in the source. "AZ redundancy" could mean synchronous-3-AZ-erasure-coded (S3-style) or asynchronous-replicated; the Databricks post does not name the durability class beyond "highly durable".
Seen in¶
- sources/2026-05-27-databricks-how-the-lakebase-architecture-stays-resilient-to-cloud-failures — canonical wiki framing. "All databases, regardless of tier and configuration" zone-redundancy as the default. The hot-standby-tax-elimination economic argument. NVMe SSD caches across multiple AZs as the latency layer.
Related¶
- concepts/compute-storage-separation — the architectural shape this concept fits into; the storage tier is what's zone-redundant; compute can be ephemeral / stateless / scale-to-zero
- concepts/stateless-compute — stateless Postgres compute is the partner property; together they replace the hot-standby tax
- concepts/availability-zone-balance — neighbouring concept; zone-redundancy is one mechanism for AZ balance
- concepts/static-stability — zone-redundant storage is a static-stability instantiation at the storage tier — the "last known good state" of data is always available because data lives in multiple AZs by construction
- systems/pageserver-safekeeper — the canonical zone-redundant-storage substrate for Lakebase / Neon
- systems/aws-s3 / systems/azure-blob-storage — the underlying object-storage primitives
- patterns/multi-az-vitess-cluster — the compute-side zone-redundancy pattern; complement to storage-side