CONCEPT Cited by 1 source
Tiered storage as primary fallback¶
Definition¶
Tiered storage as primary fallback is the architectural property where a streaming broker / database stores primary (write-path + recent-read-path) data on local NVMe and offloads older data asynchronously to cheaper object storage — such that primary-path availability does not depend on the tiered-storage layer. An outage or error spike in the object-storage layer degrades historical-read UX (slower / failed reads of cold data) but does not block writes or recent reads.
The canonical Redpanda verbatim (Source: sources/2025-06-20-redpanda-behind-the-scenes-redpanda-clouds-response-to-the-gcp-outage):
"Redpanda stores the primary data on local NVMe disks and sends older data to tiered storage, asynchronously."
Why this matters during cloud-provider outages¶
Object-storage outages (S3, GCS, Azure Blob) are a canonical cross-service-correlation failure shape: many services in a cloud region depend on the object store, and its outage cascades across them. A streaming broker that requires object storage on the write path has availability coupled to the object store; a broker that offloads asynchronously can remain writable.
The 2025-06-12 GCP outage validated this property for Redpanda: "we noticed an increase in tiered storage errors, which is not Redpanda's primary storage. We didn't get high disk utilization alerts, which we typically receive when the tiered storage subsystem has been experiencing issues for an extended period (days)." The error-rate spike was visible but customer-invisible because the write path did not depend on the failing tier.
The inversion: tiered storage as primary write path¶
The architectural opposite is "centralized metadata and a diskless architecture" — a streaming broker where the primary data path writes to object storage. This design has different trade-offs:
| Property | Local NVMe primary | Object-store primary (diskless) |
|---|---|---|
| Write latency | Low (local disk) | Higher (network + object-store write) |
| Per-byte storage cost | Higher (NVMe) | Lower (object storage) |
| Write-path availability | Independent of object store | Coupled to object store |
| Scale-to-zero | Bounded by local disk | Unbounded (stateless brokers) |
| Recovery time | Fast (local data) | Depends on re-hydration from object store |
| Correlated-failure risk | Low (local) | High (shared object store substrate) |
Redpanda's 2025-06-20 retrospective contrasts the two explicitly: "In contrast, other products boasting centralized metadata and a diskless architecture likely experienced the full weight of this global outage."
Redpanda's three-tier storage hierarchy¶
- Local NVMe — write path + recent reads (leader + follower replicas).
- Used-but-reclaimable disk space — NVMe space used for caching that can be reclaimed on demand (see concepts/unused-reclaimable-disk-buffer).
- Tiered storage (object storage) — older data flushed asynchronously; cold-path reads only.
The async flush is the load-bearing property: writes ack on local NVMe replication, not on object-store confirmation. Object-store errors back up the flush queue rather than blocking writes.
Downstream effect of the 2025-06-12 GCP outage¶
Per the post, the tiered-storage error spike manifested as:
- Elevated PUT error rates on GCS (object store) during the outage window.
- Some latency impact on certain API calls (unspecified percentile).
- No high-disk-utilization alerts — the flush backlog didn't grow long enough to saturate the NVMe buffer.
- No write-path impact — brokers kept acknowledging producer writes on local NVMe replication.
The absence of disk-utilization alerts ("which we typically receive when the tiered storage subsystem has been experiencing issues for an extended period (days)") is the quantitative boundary: the outage was short enough that the buffer absorbed the backlog.
Reliability composes with other primitives¶
- concepts/unused-reclaimable-disk-buffer — the deliberate reserve of unused + used-but-reclaimable NVMe space is what lets a long-ish object-store outage not saturate the flush buffer.
- patterns/direct-attached-nvme-with-replication — the PlanetScale Metal primitive is the same shape at a different layer: local disk + replication as the primary path, not network storage.
- concepts/tiered-storage-fast-decommission — a related tiered-storage benefit: cold data already in object storage makes broker decommission fast.
- concepts/data-plane-atomicity — tiered-storage-as- fallback is load-bearing for Redpanda's Data Plane Atomicity claim: "no external databases... as you are trying to write your data durably to disk."
Caveats¶
- Applies to streaming / append-heavy workloads. A random- access OLTP database can't use this shape as easily — the local disk must fit the working set.
- Async flush means bounded latency to cold reads. A consumer that falls behind by longer than the NVMe retention window has to read from the (cheaper but slower + error-prone) tiered storage. Outages amplify this.
- Tier promotion / demotion policy matters. Aggressive promotion-to-tiered = more dependency on object store; conservative = larger NVMe footprint.
- Object-store errors can eventually matter. If the outage runs long enough to fill the unused + reclaimable buffer, the write path does get impacted. The Redpanda post is explicit: "Additionally, as a reliability measure, we leave disk space unused and used-but-reclaimable (for caching), which we can reclaim if the situation warrants it. This outage was not that situation." — i.e., the reserve was sufficient this time.
- Not a substitute for retries / backpressure. Elevated object-store error rates still require retry discipline in the flush loop; the architecture absorbs the delay, not the failure.
- BYOC deployment layer matters. In BYOC the object store is the customer's own bucket; BYOC partially shields Redpanda- managed cluster availability from cross-customer object-store outages.
Seen in¶
- sources/2025-06-20-redpanda-behind-the-scenes-redpanda-clouds-response-to-the-gcp-outage — canonical statement of the tiered-storage-as-fallback property, validated during the 2025-06-12 GCP outage where elevated GCS PUT error rates did not impact write availability.