CONCEPT Cited by 1 source
Tiered storage for fast decommission / recommission¶
Definition¶
Tiered storage for fast decommission / recommission names the operational benefit of tiered storage that's distinct from its usual cost / capacity framing: because historical partition segments already live in object storage, the cold bulk of a broker's data doesn't move when the broker leaves or joins the cluster.
Orders-of-magnitude faster decommission (shrink cluster / replace broker) and recommission (add broker / scale out) than traditional on-broker-only storage.
Standard pitch for tiered storage — capacity¶
The conventional argument for tiered storage is storage cost: local disk is expensive per byte, object storage is cheap. Offloading older segments to S3 / GCS / Azure Blob lets operators retain orders-of-magnitude more history for the same $/month budget. See patterns/tiered-storage-to-object-store for the canonical capacity framing.
The operational reframe¶
Redpanda's post (Source: sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda) surfaces a secondary benefit:
"Decommissioning and recommissioning a broker can take time, as the data needs to be replicated away from the broker before it goes offline or re-replicated towards a new broker before it can start up and fully participate in the cluster. When tiered storage is in use, decommissioning and recommissioning can both be sped up by orders of magnitude, since a copy of the data already exists out in the object store. This means only the most recent data (that is yet to be written to tiered storage) needs to be moved to or from a broker."
Decommission (broker leaves): with tiered storage, only the hot-segment data currently on the broker's local disk (not yet offloaded to object storage) must re-replicate to the remaining brokers. Everything else is already in object storage and every broker can read it.
Recommission (broker joins, or scale-out adds a broker): same logic in reverse. The new broker only needs to locally cache the hot segments of its newly-assigned partitions; the cold bulk stays in object storage and is served from there.
Quantitative gap¶
For a broker holding, say, 10 TB of partition data where 9.5 TB is already offloaded to tiered storage and 0.5 TB is hot (current segments still on local disk):
| Strategy | Data that must move | Time at ~GB/sec replication |
|---|---|---|
| No tiered storage | 10 TB | ~3 h |
| With tiered storage | 0.5 TB | ~10 min |
~18× speedup in this example. Redpanda claims "orders of magnitude" faster without publishing specific numbers.
Why this matters — availability and elasticity¶
Three operational shapes benefit:
- Host replacement after node failure. Faster recommission means shorter recovery-to-full-redundancy windows.
- Rolling upgrades / restart storms. Each broker's restart window collapses when it only has to re-cache the hot segments.
- Scale-out under growing load. Adding a broker during a traffic surge means the new broker reaches full-participation quickly; without tiered storage, the surge could outlast the scale-out.
- Scale-down / cost-elasticity. Operators running fleet-shrinking workloads overnight (predictable drop in traffic) can remove brokers and put them back in the morning — economically impossible if every cycle costs hours of replication.
Composes with failure-recovery¶
This reframes log recovery time (Kozlovski Kafka-101: "recovery after ungraceful shutdown takes hours if not days" for 10TB brokers) — tiered storage shrinks the recovery time to the hot-segment size only. A broker with most of its data in object storage recovers from crash as quickly as one with a much smaller footprint.
Together with remote read replica topics (object-storage-backed read-only mirrors on a separate cluster), tiered storage turns object storage into a general-purpose durability + elasticity substrate for the whole cluster.
Caveats¶
- Orders-of-magnitude claim is qualitative. The post does not publish a real customer datum (e.g. "decommissioned a 10 TB broker in 10 minutes vs 3 h without tiered storage"). A production case study would strengthen the claim.
- Hot-segment size depends on tier-down cadence. If tiered storage only offloads after hours (cold-threshold = 4 hours of data), the hot segment at any instant can be hundreds of GB — still order-of-magnitude faster than full replication, but not negligible.
- Object-store read latency becomes the cluster's availability floor for historical reads. Decommission-fast is a win, but steady-state reads of cold data pay object-store latency (~10-100 ms per fetch vs ~1 ms local disk).
- Bandwidth to object store is the scale-out bottleneck. Many brokers booting in parallel all reading from S3 can hit per-account S3 request-rate limits. Requires co-ordination at large cluster scale.
Seen in¶
- sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda — canonical wiki source. Operational-benefit reframe; hot-only data must move on broker join/leave; orders-of-magnitude speedup claim.
Related¶
- systems/kafka, systems/redpanda — Kafka-API brokers where tiered storage is deployed.
- systems/aws-s3 — most common object-store backend.
- concepts/kafka-partition — the unit tiered storage offloads in segment files.
- concepts/log-recovery-time — adjacent primitive; tiered storage shrinks the recovery window.
- concepts/remote-read-replica-topic — sibling primitive; also leverages object storage.
- patterns/tiered-storage-to-object-store — the parent pattern; canonical capacity framing.