CONCEPT Cited by 2 sources
IOPS throttle on network-attached storage¶
Definition¶
Cloud providers that expose network-attached block storage (Amazon EBS, Google Persistent Disk, etc.) cap the number of I/O operations per second a volume can accept, regardless of the underlying hardware's capacity. Exceeding the cap either queues the I/O (adding latency) or rejects it (throttling the client).
Direct-attached storage — local NVMe — has no such cap. The drive's hardware limit is the limit.
Dicken's framing¶
"Many cloud providers that use this model, including AWS and Google Cloud, limit the amount of IO operations you can send over the wire. By default, a GP3 EBS instance on Amazon allows you to send 3000 IOPS per-second. This can be configured higher, but comes at extra cost."
"The older GP2 EBS volumes operate with a pool of IOPS that can be built up to allow for occasional bursts."
"If instead you have your storage attached directly to your compute instance, there are no artificial limits placed on IO operations. You can read and write as fast as the hardware will allow for."
(Source: sources/2025-03-13-planetscale-io-devices-and-latency)
Concrete EBS limits (at time of writing)¶
| Volume type | Default IOPS | Max IOPS | Pricing |
|---|---|---|---|
| gp3 | 3,000 | 16,000 (configurable) | Extra IOPS priced per-IOPS/month |
| gp2 | 3× GiB (burst bucket) | 16,000 | IOPS tied to volume size |
| io2 Block Express | Up to 256,000/volume | Highest tier | Premium per-IOPS price |
| Direct NVMe (e.g. i3 / i4 / Metal instance stores) | Hardware-limited | Hundreds-of-thousands | Bundled with instance |
The pool-and-burst model (GP2) accumulates credit while the volume is idle and drains credit during bursts. A volume that stays at peak load exhausts the bucket and falls to the baseline rate.
Why the cap exists¶
Network-attached block storage runs on a shared fleet of storage servers. If every volume could issue IOPS at the hardware limit of its backing SSD, a single noisy-neighbor would starve every other volume on the same server. IOPS caps are the isolation mechanism the shared fleet uses to honour per-volume SLAs.
This is the same problem EBS's concepts/noisy-neighbor / concepts/performance-isolation engineering addresses on the latency axis (systems/aws-ebs). Caps solve it administratively; engineering shrinks the cap over time.
Why it's a database problem¶
OLTP databases issue many small I/Os: a commit fsyncs the WAL (one or more pages), each index update may modify one or more pages, etc. At 3,000 IOPS per volume, a database doing 1,500 write transactions per second with ~2 page writes per transaction is at the cap — before any read I/O. Production workloads routinely need 10,000–100,000+ IOPS, which means paying for provisioned IOPS at real cost.
Direct-attached NVMe runs at hardware limit — a mid-range consumer NVMe does 100k+ IOPS; datacenter drives 500k+. The gap to a default EBS volume is ~30×–200× on IOPS headroom.
Architectural workarounds¶
- Pay for more IOPS (gp3 configurable, io2) — linear cost scaling.
- Stripe across volumes — N volumes = N × per-volume cap. Adds operational complexity (RAID 0 on network disks).
- Push work to local NVMe — instance-store / local SSD skips the IOPS tax but gives up the durability guarantee EBS provides on instance loss.
- Replace fsync with group commit — fewer, bigger I/Os, trading latency for throughput under the cap.
- Switch to direct-attached + replication — the systems/planetscale-metal approach: patterns/direct-attached-nvme-with-replication.
Seen in¶
- sources/2025-03-13-planetscale-io-devices-and-latency — canonical naming of the "cloud caps IOPS" framing + GP2 burst bucket + GP3 default 3,000 IOPS.
- sources/2025-10-14-planetscale-benchmarking-postgres-17-vs-18
— workload-side measurement of the throttle. Dicken's
Postgres 17 vs 18 benchmarks on
r7i.2xlargeat 50 connections: "IOPS and throughput are clear bottlenecks for each of the EBS-backed instances. The different versions / I/O settings don't make a huge difference in such cases." QPS scales in lockstep with the EBS capability tier (gp3-3k → gp3-10k → io2-16k → NVMe-300k). First canonical wiki instance of the throttle measured at the database-QPS altitude rather than the volume-cap altitude. Postgres 18's new async-I/O modes don't move the bottleneck: the IOPS ceiling applies regardless of how the application schedules its I/Os.