Skip to content

CONCEPT Cited by 3 sources

I/O-latency-sensitive workload

Definition

A workload whose end-to-end performance is bounded by the per-I/O round-trip latency of the storage substrate rather than by aggregate IOPS throughput or sequential bandwidth. Provisioning more IOPS on the existing substrate does not fix the problem because the binding constraint is per-operation wall-clock latency, not the number of operations per second.

Diagnostic signal

The canonical signal (Hazen, PlanetScale, 2025-03-11):

"We'd had to provision more IOPS to the EBS volumes backing MySQL in our sharded keyspace to keep up with the telemetry volume. Since this workload had demonstrated a sensitivity to I/O latency, we figured it would be a good candidate for upgrading to Metal."

(Source: sources/2026-04-21-planetscale-upgrading-query-insights-to-metal.)

The operator had already paid for extra IOPS; the workload was still slow. The remaining headroom is latency-per-I/O, not operations-per-second.

Workload-shape fingerprint

I/O-latency sensitivity usually coincides with one or more of:

  • High concurrent-writer thread count — each thread waits synchronously on its own fsync / page-flush / index-update. Aggregate throughput = threads ÷ per-I/O latency. More threads don't help if each is blocked on a 250 μs EBS round-trip. The Insights pipeline: 32 consumers × 25 threads = 800 concurrent writer threads all stalled on per-write I/O.
  • Random-access I/O pattern, not sequential bandwidth — pre-fetching + sequential readahead doesn't amortise latency. B+tree descent, secondary-index update, MVCC version fetch.
  • Small per-I/O payloads — latency-dominated: 4 KB I/Os at 250 μs spend 99% of the time in round-trip, 1% in bandwidth.
  • Cache miss on hot path — every miss pays full substrate latency; cache-heavy workloads don't stress this axis.
  • Tail latency (p99, p99.9) degrades faster than median throughput — EBS's variance floor hits tail first (see concepts/performance-variance-degradation).

Distinction from IOPS-saturated workloads

patterns/sharding-as-iops-scaling addresses IOPS-cap workloads — the cheap-tier volume ceiling (gp3 default 3,000 IOPS) is the binding constraint, and spreading load across N shards keeps each shard below the ceiling. This is a throughput problem.

I/O-latency-sensitive workloads are a distinct category: even unlimited IOPS on the same substrate doesn't help, because each I/O still takes the substrate's round-trip floor (~250 μs for EBS, ~50 μs for local NVMe — see concepts/network-attached-storage-latency-penalty). This is a latency problem.

A single workload can be both. The Insights pipeline was both: PlanetScale had already applied sharding-as-IOPS-scaling (8 shards) and provisioned extra IOPS on EBS, and the latency axis was still binding. The [[patterns/direct-attached-nvme-with- replication|Metal substrate]] fixed the latency axis that sharding + IOPS could not.

Fix: substrate swap

The canonical fix for an I/O-latency-sensitive workload is a substrate swap to direct-attached NVMe (PlanetScale Metal) — 5× lower per-I/O round-trip on the same vCPU class. The Insights migration reported "substantial decrease in latency across all the measured percentiles" at p50 / p90 / p95 / p99 with no application, architecture, or sharding changes.

Downstream effects

Fixing the latency axis on a high-concurrency write pipeline has multiplicative downstream effects:

  • Each writer thread finishes faster → same N threads deliver more writes/sec → aggregate throughput rises.
  • Upstream queue backlog drains faster — e.g. Kafka consumer backlog shrinks as MySQL write latency drops.
  • Capacity headroom grows for free — same hardware, same concurrency config, more in-flight capacity.

All three are reported in the Insights-to-Metal migration.

Seen in

Last updated · 470 distilled / 1,213 read