CONCEPT Cited by 1 source
Storage bottleneck migration¶
Definition¶
Storage bottleneck migration names the recurring observation across the history of high-performance systems: the dominant performance bottleneck in the I/O path is not stable — it migrates across hardware generations, and each migration invalidates the design assumptions of systems built for the previous era. A system architected for one bottleneck location must be re-examined when the bottleneck moves; the substrate it was tuned against is no longer the limiting factor, and a new one has appeared in a location the architecture may not have provisioned for.
Canonical wiki statement¶
"Redpanda was built in an era of shifting bottlenecks. In the early days of designing Redpanda, we loved to talk about how the I/O bottleneck was migrating from the storage device to the CPU. Modern storage devices were becoming so fast that it wasn't a clunky spinning disk slowing things down; it was thread context switching and cache invalidations."
"Today, we take the existence of these fast storage devices for granted and work around the CPU bottleneck using a thread-per-core architecture. […] Fast forward to today, and we see that bottlenecks are shifting again. Demand for low-cost storage solutions, such as cloud object storage, is reintroducing high-latency I/O into systems designed around the assumption of low-latency storage."
"Just as older systems had to confront the introduction of high-performance storage, Redpanda isn't immune either."
This is the load-bearing framing for why Cloud Topics needed a production-tuning fix even though the architectural primitives (L0 files, placeholder batches, Reconciler) were correct: the substrate's latency profile changed, and the write pipeline's implicit concurrency provisioning didn't scale to absorb it.
Three eras of the streaming-broker bottleneck¶
| Era | Dominant substrate | Bottleneck location | Architectural response |
|---|---|---|---|
| HDD era | Spinning disk | Disk seek / rotational latency | Sequential-write log structures (Kafka append-only segments); large page caches |
| NVMe era | Local NVMe | Thread context switch + cache invalidation | Seastar / thread-per-core / shared-nothing per-core; co-routines instead of OS threads (systems/redpanda) |
| Object-storage era | S3 / GCS / ADLS | Network-mediated I/O latency (10s-100s ms p50, multi-second p99) | Concurrency inflation (queue-buffer stages); cross-partition batching; durable-state-as-URL (concepts/object-storage-as-disk-root) |
The pattern is clear: each era's solution becomes the next era's implicit assumption. Kafka's append-only segments assumed sequential HDD writes were cheap; Redpanda's thread-per-core assumed NVMe was fast enough that CPU would dominate; the next architectural layer must assume that a substantial fraction of writes will see high-latency, network-mediated I/O.
Why bottlenecks migrate predictably¶
Three structural forces drive the migration:
-
Hardware-cost inversion. As the previously expensive component gets cheaper, system designers stop optimising it explicitly, and another component becomes the binding constraint. NVMe SSDs eliminated seek latency; the bottleneck moved to whatever was the next slowest piece (kernel context switches in network-bound server stacks; cache invalidations in shared-state runtimes).
-
Cost-economics demand-pull. Customers pull substrates from high-performance / high-cost into low-cost / lower-performance, even when this re-introduces previously-eliminated bottlenecks. Redpanda's framing is explicit: "Demand for low-cost storage solutions […] is reintroducing high-latency I/O." The market forces the system architect to re-confront the slow substrate.
-
Substrate-substitution-as-feature. The new substrate is rolled out as an opt-in (Cloud Topics is a per-topic class, not a global replacement). This means the system has to support both the old substrate's assumptions and the new substrate's profile simultaneously — sometimes within a single cluster, even within the same write path on a per-record basis. The architectural cost is a rebalancing of where concurrency / batching / caching provisioning lives.
Migration triggers a design audit¶
When the bottleneck migrates, every implicit assumption in the existing design becomes a candidate audit point:
-
Latency budgets that were over-provisioned for the old substrate (replication ~10 ms) are now under-provisioned for the new one (object-storage upload ~100 ms-1 s). The per-connection throughput ceiling, governed by Little's Law, drops proportionally.
-
Concurrency provisioning in the write pipeline must rise proportionally to the latency increase. A stage that was pipelined (single-deep, position-guaranteeing handoff) may need to be re-architected with a buffered queue (patterns/concurrency-buffer-stage-for-high-latency-io) for the slow substrate.
-
Batching windows sized for fast substrates may be under-amortising fixed-cost penalties on the new one. Cloud Topics' 0.25 s / 4 MB cross-partition window is sized to amortise S3 PUT cost; an NVMe-era choice would have been smaller.
-
Failure-mode assumptions carry over silently and may be wrong. A system architected against torn pages on local disk (concepts/torn-page) may be over-engineered when the new substrate has its own integrity guarantees (object-storage multi-AZ durability) — see Lakebase's FPW elimination for the parallel pattern in the Postgres world.
-
Client-API contracts that assumed the old latency profile (e.g.
max.in.flight.requests.per.connection = 1for ordering-sensitive workloads) become hostile on the new substrate. Either the contract loosens or the broker has to do more work to maintain it under the new latency profile — Cloud Topics chose the latter (patterns/pipelined-produce-with-position-guarantee).
Relationship to other wiki concepts¶
- concepts/littles-law is the algebraic lens through which bottleneck migration shows up as a concurrency-provisioning problem.
- concepts/object-storage-as-disk-root is the architectural posture that embraces the migration into object-storage-bound I/O — Fly Sprites' design treats S3 as the durability root, NVMe as a cache.
- concepts/network-attached-storage-latency-penalty is the per-stage symptom: any time a system reads/writes through the network instead of local disk, latency budget needs re-checking.
- concepts/io-latency-sensitive-workload is the workload-side axis — some workloads tolerate the migration into high-latency storage (observability, compliance, model training); others cannot. Cloud Topics is per-topic precisely so each topic can pick its substrate without the whole cluster paying the cost.
- concepts/batching-latency-tradeoff is the trade-off knob that often has to be retuned post-migration — bigger batches amortise the slow substrate's per-operation cost.
Operational implications¶
For a team confronting a substrate migration:
-
Treat latency multipliers as concurrency demands. If you're moving a stage from 10 ms to 1000 ms, you need to accommodate 100× concurrency at that stage to hold throughput constant. Plan for it before you ship; don't discover it in OMB.
-
Audit pipelined stages before the substrate change. A stage that was pipelined-but-not-queued is a stage whose effective concurrency is bounded by the producer's
max.in.flight.requests.per.connection. That cap is fine on NVMe and fatal on object storage. -
Run real-world workload tests before public preview. The Redpanda team's candid retrospective makes this point directly: "we had been so focused on building functionality that we hadn't been focused on pushing real-world workloads through the system." OMB / synthetic-shape benchmarks are the cheapest way to surface a missed concurrency-provisioning gap.
-
Don't assume client-side knobs will save you. The Redpanda fix was deliberately broker-side, validated on OMB "without needing to change any producer configurations." The client API is the contract; the broker absorbs the substrate change.
Seen in¶
- Redpanda — Little's Law in practice with Cloud Topics (2026-05-05) — canonical instance: HDD → NVMe → object-storage migration framed as "shifting bottlenecks", with Cloud Topics' upload-queue stage as the architectural response.
- Redpanda — Cloud Topics architecture (2026-03-30) — the substrate-side design (L0 / L1 files, placeholder batches, Reconciler) that embodies the architectural response to the storage migration.
- Fly.io Sprites (2026-01-14) — different architectural posture: object storage as the durability root rather than a per-topic class.
- Databricks / Lakebase (2026-05-07) — Postgres-world parallel: compute-storage separation removes the failure mode (torn pages) that local-disk-era FPW was designed to mitigate.
Related¶
- concepts/littles-law
- concepts/object-storage-as-disk-root
- concepts/network-attached-storage-latency-penalty
- concepts/io-latency-sensitive-workload
- concepts/disk-throughput-bottleneck
- concepts/latency-critical-vs-latency-tolerant-workload
- concepts/batching-latency-tradeoff
- systems/redpanda
- systems/redpanda-cloud-topics
- systems/aws-s3