Skip to content

CONCEPT Cited by 3 sources

Queueing theory (as applied to storage/IO stacks)

Definition

Queueing theory is the math of how waiting lines form and drain when arrivals are asynchronous. Applied to systems: between the CPU and durable storage there is always a chain of queues — OS kernel ↔ host storage adapter ↔ fabric ↔ target adapter ↔ media — and each queue is a candidate bottleneck, a variance amplifier, or a place for one tenant's workload to interfere with another's.

The bank metaphor (Olson, 2024)

The EBS retrospective explains it as a bank:

You walk into the bank with a deposit, but first you have to traverse a queue before you can speak with a bank teller… In a perfect world, the number of patrons entering the bank arrive at the exact rate at which their request can be handled, and you never have to stand in a queue. But the real world isn't perfect. The real world is asynchronous.

Key implications:

  • Averages hide the experience of the last person in line. Even if average wait is "acceptable," the first customer had zero wait, the last had a long one.
  • Unless you have infinite resources, queues are necessary to absorb peak load. Queueless = over-provisioned.
  • Options to improve any queue boil down to: add workers (parallelism), speed up each worker (lower service time), split queues by class (priority/SLO lanes). Each costs money.

In network storage stacks

In a network-attached block-storage system (e.g. systems/aws-ebs) there are several stacked queues:

  • OS kernel ↔ storage adapter
  • Host storage adapter ↔ storage fabric
  • Target storage adapter
  • Storage media itself

Legacy stacks use different vendors per layer (fiber channel, iSCSI, NFS over TCP), and tuning "the" storage network requires specialized knowledge distinct from tuning the app or the media.

The hidden failure mode is cross-layer interference: a queue-depth choice at one layer couples tenants at an adjacent layer. Classic example from the EBS post — Xen block-device default ring-queue parameters capped the host at 64 outstanding IOs across all devices, a per-host noisy-neighbor source that only surfaced via patterns/loopback-isolation.

Why this matters for EBS-class systems

  • Hard drives amplified variance. 120-150 IOPS/drive hard ceiling; command reordering and queue depth push tail latency into the hundreds of ms. Spreading one tenant across many drives reduced their worst case but widened the blast radius — classic noisy-neighbor compounding.
  • SSDs didn't fix the stack. They collapsed media latency, which shifted the bottleneck up to queues in OS, hypervisor, and network.
  • Queue reduction is a north star. systems/nitro offload removes hypervisor queue layers. systems/srd replaces TCP with a protocol whose congestion/retransmit behavior doesn't impose an ordering queue storage IOs don't need.

Seen in

Seen in — GPU inference serving (Voyage AI, 2025)

Last updated · 200 distilled / 1,178 read