Skip to content

CONCEPT Cited by 1 source

Storage bottleneck migration

Definition

Storage bottleneck migration names the recurring observation across the history of high-performance systems: the dominant performance bottleneck in the I/O path is not stable — it migrates across hardware generations, and each migration invalidates the design assumptions of systems built for the previous era. A system architected for one bottleneck location must be re-examined when the bottleneck moves; the substrate it was tuned against is no longer the limiting factor, and a new one has appeared in a location the architecture may not have provisioned for.

Canonical wiki statement

Redpanda 2026-05-05:

"Redpanda was built in an era of shifting bottlenecks. In the early days of designing Redpanda, we loved to talk about how the I/O bottleneck was migrating from the storage device to the CPU. Modern storage devices were becoming so fast that it wasn't a clunky spinning disk slowing things down; it was thread context switching and cache invalidations."

"Today, we take the existence of these fast storage devices for granted and work around the CPU bottleneck using a thread-per-core architecture. […] Fast forward to today, and we see that bottlenecks are shifting again. Demand for low-cost storage solutions, such as cloud object storage, is reintroducing high-latency I/O into systems designed around the assumption of low-latency storage."

"Just as older systems had to confront the introduction of high-performance storage, Redpanda isn't immune either."

This is the load-bearing framing for why Cloud Topics needed a production-tuning fix even though the architectural primitives (L0 files, placeholder batches, Reconciler) were correct: the substrate's latency profile changed, and the write pipeline's implicit concurrency provisioning didn't scale to absorb it.

Three eras of the streaming-broker bottleneck

Era Dominant substrate Bottleneck location Architectural response
HDD era Spinning disk Disk seek / rotational latency Sequential-write log structures (Kafka append-only segments); large page caches
NVMe era Local NVMe Thread context switch + cache invalidation Seastar / thread-per-core / shared-nothing per-core; co-routines instead of OS threads (systems/redpanda)
Object-storage era S3 / GCS / ADLS Network-mediated I/O latency (10s-100s ms p50, multi-second p99) Concurrency inflation (queue-buffer stages); cross-partition batching; durable-state-as-URL (concepts/object-storage-as-disk-root)

The pattern is clear: each era's solution becomes the next era's implicit assumption. Kafka's append-only segments assumed sequential HDD writes were cheap; Redpanda's thread-per-core assumed NVMe was fast enough that CPU would dominate; the next architectural layer must assume that a substantial fraction of writes will see high-latency, network-mediated I/O.

Why bottlenecks migrate predictably

Three structural forces drive the migration:

  1. Hardware-cost inversion. As the previously expensive component gets cheaper, system designers stop optimising it explicitly, and another component becomes the binding constraint. NVMe SSDs eliminated seek latency; the bottleneck moved to whatever was the next slowest piece (kernel context switches in network-bound server stacks; cache invalidations in shared-state runtimes).

  2. Cost-economics demand-pull. Customers pull substrates from high-performance / high-cost into low-cost / lower-performance, even when this re-introduces previously-eliminated bottlenecks. Redpanda's framing is explicit: "Demand for low-cost storage solutions […] is reintroducing high-latency I/O." The market forces the system architect to re-confront the slow substrate.

  3. Substrate-substitution-as-feature. The new substrate is rolled out as an opt-in (Cloud Topics is a per-topic class, not a global replacement). This means the system has to support both the old substrate's assumptions and the new substrate's profile simultaneously — sometimes within a single cluster, even within the same write path on a per-record basis. The architectural cost is a rebalancing of where concurrency / batching / caching provisioning lives.

Migration triggers a design audit

When the bottleneck migrates, every implicit assumption in the existing design becomes a candidate audit point:

  • Latency budgets that were over-provisioned for the old substrate (replication ~10 ms) are now under-provisioned for the new one (object-storage upload ~100 ms-1 s). The per-connection throughput ceiling, governed by Little's Law, drops proportionally.

  • Concurrency provisioning in the write pipeline must rise proportionally to the latency increase. A stage that was pipelined (single-deep, position-guaranteeing handoff) may need to be re-architected with a buffered queue (patterns/concurrency-buffer-stage-for-high-latency-io) for the slow substrate.

  • Batching windows sized for fast substrates may be under-amortising fixed-cost penalties on the new one. Cloud Topics' 0.25 s / 4 MB cross-partition window is sized to amortise S3 PUT cost; an NVMe-era choice would have been smaller.

  • Failure-mode assumptions carry over silently and may be wrong. A system architected against torn pages on local disk (concepts/torn-page) may be over-engineered when the new substrate has its own integrity guarantees (object-storage multi-AZ durability) — see Lakebase's FPW elimination for the parallel pattern in the Postgres world.

  • Client-API contracts that assumed the old latency profile (e.g. max.in.flight.requests.per.connection = 1 for ordering-sensitive workloads) become hostile on the new substrate. Either the contract loosens or the broker has to do more work to maintain it under the new latency profile — Cloud Topics chose the latter (patterns/pipelined-produce-with-position-guarantee).

Relationship to other wiki concepts

  • concepts/littles-law is the algebraic lens through which bottleneck migration shows up as a concurrency-provisioning problem.
  • concepts/object-storage-as-disk-root is the architectural posture that embraces the migration into object-storage-bound I/O — Fly Sprites' design treats S3 as the durability root, NVMe as a cache.
  • concepts/network-attached-storage-latency-penalty is the per-stage symptom: any time a system reads/writes through the network instead of local disk, latency budget needs re-checking.
  • concepts/io-latency-sensitive-workload is the workload-side axis — some workloads tolerate the migration into high-latency storage (observability, compliance, model training); others cannot. Cloud Topics is per-topic precisely so each topic can pick its substrate without the whole cluster paying the cost.
  • concepts/batching-latency-tradeoff is the trade-off knob that often has to be retuned post-migration — bigger batches amortise the slow substrate's per-operation cost.

Operational implications

For a team confronting a substrate migration:

  1. Treat latency multipliers as concurrency demands. If you're moving a stage from 10 ms to 1000 ms, you need to accommodate 100× concurrency at that stage to hold throughput constant. Plan for it before you ship; don't discover it in OMB.

  2. Audit pipelined stages before the substrate change. A stage that was pipelined-but-not-queued is a stage whose effective concurrency is bounded by the producer's max.in.flight.requests.per.connection. That cap is fine on NVMe and fatal on object storage.

  3. Run real-world workload tests before public preview. The Redpanda team's candid retrospective makes this point directly: "we had been so focused on building functionality that we hadn't been focused on pushing real-world workloads through the system." OMB / synthetic-shape benchmarks are the cheapest way to surface a missed concurrency-provisioning gap.

  4. Don't assume client-side knobs will save you. The Redpanda fix was deliberately broker-side, validated on OMB "without needing to change any producer configurations." The client API is the contract; the broker absorbs the substrate change.

Seen in

  • Redpanda — Little's Law in practice with Cloud Topics (2026-05-05) — canonical instance: HDD → NVMe → object-storage migration framed as "shifting bottlenecks", with Cloud Topics' upload-queue stage as the architectural response.
  • Redpanda — Cloud Topics architecture (2026-03-30) — the substrate-side design (L0 / L1 files, placeholder batches, Reconciler) that embodies the architectural response to the storage migration.
  • Fly.io Sprites (2026-01-14) — different architectural posture: object storage as the durability root rather than a per-topic class.
  • Databricks / Lakebase (2026-05-07) — Postgres-world parallel: compute-storage separation removes the failure mode (torn pages) that local-disk-era FPW was designed to mitigate.
Last updated · 542 distilled / 1,571 read