CONCEPT Cited by 2 sources

IOPS throttle on network-attached storage¶

Definition¶

Cloud providers that expose network-attached block storage (Amazon EBS, Google Persistent Disk, etc.) cap the number of I/O operations per second a volume can accept, regardless of the underlying hardware's capacity. Exceeding the cap either queues the I/O (adding latency) or rejects it (throttling the client).

Direct-attached storage — local NVMe — has no such cap. The drive's hardware limit is the limit.

Dicken's framing¶

"Many cloud providers that use this model, including AWS and Google Cloud, limit the amount of IO operations you can send over the wire. By default, a GP3 EBS instance on Amazon allows you to send 3000 IOPS per-second. This can be configured higher, but comes at extra cost."

"The older GP2 EBS volumes operate with a pool of IOPS that can be built up to allow for occasional bursts."

"If instead you have your storage attached directly to your compute instance, there are no artificial limits placed on IO operations. You can read and write as fast as the hardware will allow for."

(Source: sources/2025-03-13-planetscale-io-devices-and-latency)

Concrete EBS limits (at time of writing)¶

Volume type	Default IOPS	Max IOPS	Pricing
gp3	3,000	16,000 (configurable)	Extra IOPS priced per-IOPS/month
gp2	3× GiB (burst bucket)	16,000	IOPS tied to volume size
io2 Block Express	Up to 256,000/volume	Highest tier	Premium per-IOPS price
Direct NVMe (e.g. i3 / i4 / Metal instance stores)	Hardware-limited	Hundreds-of-thousands	Bundled with instance

The pool-and-burst model (GP2) accumulates credit while the volume is idle and drains credit during bursts. A volume that stays at peak load exhausts the bucket and falls to the baseline rate.

Why the cap exists¶

Network-attached block storage runs on a shared fleet of storage servers. If every volume could issue IOPS at the hardware limit of its backing SSD, a single noisy-neighbor would starve every other volume on the same server. IOPS caps are the isolation mechanism the shared fleet uses to honour per-volume SLAs.

This is the same problem EBS's concepts/noisy-neighbor / concepts/performance-isolation engineering addresses on the latency axis (systems/aws-ebs). Caps solve it administratively; engineering shrinks the cap over time.

Why it's a database problem¶

OLTP databases issue many small I/Os: a commit fsyncs the WAL (one or more pages), each index update may modify one or more pages, etc. At 3,000 IOPS per volume, a database doing 1,500 write transactions per second with ~2 page writes per transaction is at the cap — before any read I/O. Production workloads routinely need 10,000–100,000+ IOPS, which means paying for provisioned IOPS at real cost.

Direct-attached NVMe runs at hardware limit — a mid-range consumer NVMe does 100k+ IOPS; datacenter drives 500k+. The gap to a default EBS volume is ~30×–200× on IOPS headroom.

Architectural workarounds¶

Pay for more IOPS (gp3 configurable, io2) — linear cost scaling.
Stripe across volumes — N volumes = N × per-volume cap. Adds operational complexity (RAID 0 on network disks).
Push work to local NVMe — instance-store / local SSD skips the IOPS tax but gives up the durability guarantee EBS provides on instance loss.
Replace fsync with group commit — fewer, bigger I/Os, trading latency for throughput under the cap.
Switch to direct-attached + replication — the systems/planetscale-metal approach: patterns/direct-attached-nvme-with-replication.

Seen in¶

— Canonical network-physics framing of the throttle. Richard Crowley (PlanetScale, 2025-03-11) extends Dicken's "cloud caps IOPS" administrative framing into a network- fabric-physics framing: "Even at the very expensive upper end — EBS io2, for example — the network holds the storage hardware back. A local NVMe drive can often perform an order of magnitude more IOPS than a network-attached storage volume." Canonical wiki datum that the throttle is not just administrative (per-volume SLA enforcement) but structural — the network fabric itself saturates before the SSD hardware does. Instance-type evidence: "Take the EBS performance of an r6i.4xlarge EC2 instance, for example. It can perform 40,000 IOPS if the volume or volumes can keep up. … By contrast, an i4i.4xlarge EC2 instance can perform 220,000 random write or 400,000 random read IOPS using local NVMe SSDs" — 5.5× more random write, 10× more random read IOPS on the same vCPU class when storage is direct-attached. Canonical network-vs-hardware distinction: even if io2 were unthrottled administratively, the NIC + switch + remote-machine path would still cap throughput below the NVMe hardware limit.
sources/2026-04-21-planetscale-increase-iops-and-throughput-with-sharding — Canonical pricing-consequence view of the IOPS throttle. Ben Dicken extends the "cloud caps IOPS" framing (which the 2025-03-13 post canonicalised as a latency phenomenon) into a cost phenomenon: crossing the gp3 IOPS ceiling (16k) forces the architect onto the io1 / io2 provisioned-IOPS tier, producing an 11-13× cost multiplier for an 8× workload. The IOPS cap is a super-linear cost cliff, not a linear gradient. Canonicalises the independent throughput dimension (concepts/throughput-vs-iops) alongside IOPS — gp3 also caps throughput at 125 MiB/s default, and the two caps must be provisioned separately. Canonicalises the burst bucket mechanic ("bank unused IOPS, up to a fixed limit") as the token-bucket-at-the-I/O-layer pattern. Canonicalises sharding as IOPS scaling: each shard sees 1/N the demand, stays below the cheap-tier ceiling, avoids the regime-shift cliff.
sources/2025-03-13-planetscale-io-devices-and-latency — canonical naming of the "cloud caps IOPS" framing + GP2 burst bucket + GP3 default 3,000 IOPS.
— workload-side measurement of the throttle. Dicken's Postgres 17 vs 18 benchmarks on r7i.2xlarge at 50 connections: "IOPS and throughput are clear bottlenecks for each of the EBS-backed instances. The different versions / I/O settings don't make a huge difference in such cases." QPS scales in lockstep with the EBS capability tier (gp3-3k → gp3-10k → io2-16k → NVMe-300k). First canonical wiki instance of the throttle measured at the database-QPS altitude rather than the volume-cap altitude. Postgres 18's new async-I/O modes don't move the bottleneck: the IOPS ceiling applies regardless of how the application schedules its I/Os.
— Berquist positions IOPS saturation as the lagging indicator of the write- throughput ceiling (the leading indicator being replication lag) and canonicalises the substrate-dependence of the ceiling: "With PlanetScale Metal … Metal keeps [storage and compute] together on the same hardware … substantially higher IOPS. If you're on Metal, you can often delay sharding and continue scaling vertically much further (into the several TB range) than traditional cloud database architectures allow." Network-attached storage (RDS, CloudSQL) hits the IOPS ceiling earlier than direct-attached NVMe — meaning the "when to shard" decision is substrate-conditioned.