CONCEPT Cited by 1 source
Async I/O concurrency threshold¶
Definition¶
Async I/O concurrency threshold is the observation that
asynchronous-I/O interfaces (like Linux's io_uring)
only outperform synchronous I/O above a certain concurrency /
I/O-rate level. Below that threshold, the overhead of async
submission + completion tracking exceeds the latency-hiding benefit
— synchronous I/O wins because it has a shorter code path on the
hot path for a single I/O.
At low concurrency (few in-flight I/Os), there's nothing for async I/O to hide — the caller is waiting for one request at a time anyway, and sync I/O's simpler code path is faster.
At high concurrency (many in-flight I/Os), async I/O hides per-request latency behind the next request, and the raw throughput is bound by the storage device's parallelism (multiple NAND targets, multiple queue depth).
Why the threshold exists¶
The per-I/O cost decomposes into:
- Submission overhead (set up the request, enqueue it).
- Latency floor (the physics — NAND read, network RTT, seek).
- Completion overhead (reap the result, dispatch downstream).
Sync I/O has low submission + completion overhead but serialises the latency floor — one I/O at a time from the caller's perspective.
Async I/O adds submission + completion overhead but amortises the latency floor across many concurrent requests.
Below the threshold, sync_overhead + latency < async_overhead +
(latency / in_flight) — sync wins. Above, async wins.
Canonical wiki instance (PlanetScale 2025-10-14)¶
Ben Dicken's Postgres 17 vs Postgres 18 benchmark (sources/2025-10-14-planetscale-benchmarking-postgres-17-vs-18)
provides the canonical empirical observation. Testing Postgres 18
with io_method set to sync, worker, and io_uring across
1 / 10 / 50 connections on EBS and local NVMe:
- At 1 connection on EBS,
io_uringloses tosyncandworker. Surprising result: "I'll admit, this surprised me! My expectation was that io_uring would perform as well as if not better than all these options." - At 10 connections on gp3-3k,
io_uringis significantly worse than the other options. - At 50 connections on gp3-3k,
io_uringis only slightly worse than the other options — the gap narrows. - At 50 connections on local NVMe,
io_uringslightly beats the other options — the threshold is finally crossed.
Dicken's explicit formulation: "io_uring performs well when there's lots of I/O concurrency, but in low-concurrency scenarios it isn't as beneficial."
Additional factors on the threshold¶
- Storage latency floor. On network-attached storage (~250 μs round-trip), the latency floor dominates so thoroughly that async-I/O concurrency-hiding doesn't help much. On local NVMe (~50 μs), the floor is low enough that async parallelism matters.
- Post-I/O CPU work. If the caller is CPU-bound after the
I/O completes (checksums, memcpy, decompression), then async
I/O's latency-hiding is upper-bounded by per-process CPU
saturation. This is why Postgres 18 ships with
io_method=workeras default, notio_uring—workerspreads the post-I/O CPU across processes too. - Workload shape. Point selects issue one I/O at a time and therefore sit below the threshold regardless of connection count. Range scans issue many I/Os per query and can cross the threshold at modest connection counts.
Implications for system design¶
- Don't assume
io_uringis always faster. Benchmark the specific workload + concurrency + storage combination. - The
io_method=workerhybrid is a deliberate middle ground. Farming I/O out to worker processes distributes both the I/O submission and the post-I/O CPU work; benefits at concurrency levels whereio_uringstill costs more than it saves. - Storage latency dominates at low concurrency. On network-attached storage, shave the latency floor (direct-attached NVMe) before reaching for async-I/O knobs.
- Applications need to know their concurrency regime. OLTP backends servicing bursty single-connection traffic are below the threshold. Batch / analytics / streaming workloads tend to be above it.
Related concepts¶
- concepts/postgres-async-io — concrete Postgres 18 instance where the threshold is measurable per-workload.
- concepts/network-attached-storage-latency-penalty — the 5× EBS latency hop that sets the dominant floor in cloud deployments; async I/O only helps above it.
- concepts/iops-throttle-network-storage — cloud-side throttle that independently caps the rate at which the threshold matters.
- concepts/benchmark-methodology-bias — benchmarks that under-test high-concurrency regimes will miss async I/O's payoff zone.
Seen in¶
- sources/2025-10-14-planetscale-benchmarking-postgres-17-vs-18
— canonical wiki introduction. Postgres 18's
io_uringmode underperforms at low concurrency and on network-attached storage, only winning on local NVMe at 50 connections with large range scans.