CONCEPT

Async I/O concurrency threshold¶

Definition¶

Async I/O concurrency threshold is the observation that asynchronous-I/O interfaces (like Linux's io_uring) only outperform synchronous I/O above a certain concurrency / I/O-rate level. Below that threshold, the overhead of async submission + completion tracking exceeds the latency-hiding benefit — synchronous I/O wins because it has a shorter code path on the hot path for a single I/O.

At low concurrency (few in-flight I/Os), there's nothing for async I/O to hide — the caller is waiting for one request at a time anyway, and sync I/O's simpler code path is faster.

At high concurrency (many in-flight I/Os), async I/O hides per-request latency behind the next request, and the raw throughput is bound by the storage device's parallelism (multiple NAND targets, multiple queue depth).

Why the threshold exists¶

The per-I/O cost decomposes into:

Submission overhead (set up the request, enqueue it).
Latency floor (the physics — NAND read, network RTT, seek).
Completion overhead (reap the result, dispatch downstream).

Sync I/O has low submission + completion overhead but serialises the latency floor — one I/O at a time from the caller's perspective.

Async I/O adds submission + completion overhead but amortises the latency floor across many concurrent requests.

Below the threshold, sync_overhead + latency < async_overhead + (latency / in_flight) — sync wins. Above, async wins.

Canonical wiki instance (PlanetScale 2025-10-14)¶

Ben Dicken's Postgres 17 vs Postgres 18 benchmark () provides the canonical empirical observation. Testing Postgres 18 with io_method set to sync, worker, and io_uring across 1 / 10 / 50 connections on EBS and local NVMe:

At 1 connection on EBS, io_uring loses to sync and worker. Surprising result: "I'll admit, this surprised me! My expectation was that io_uring would perform as well as if not better than all these options."
At 10 connections on gp3-3k, io_uring is significantly worse than the other options.
At 50 connections on gp3-3k, io_uring is only slightly worse than the other options — the gap narrows.
At 50 connections on local NVMe, io_uring slightly beats the other options — the threshold is finally crossed.

Dicken's explicit formulation: "io_uring performs well when there's lots of I/O concurrency, but in low-concurrency scenarios it isn't as beneficial."

Additional factors on the threshold¶

Storage latency floor. On network-attached storage (~250 μs round-trip), the latency floor dominates so thoroughly that async-I/O concurrency-hiding doesn't help much. On local NVMe (~50 μs), the floor is low enough that async parallelism matters.
Post-I/O CPU work. If the caller is CPU-bound after the I/O completes (checksums, memcpy, decompression), then async I/O's latency-hiding is upper-bounded by per-process CPU saturation. This is why Postgres 18 ships with io_method=worker as default, not io_uring — worker spreads the post-I/O CPU across processes too.
Workload shape. Point selects issue one I/O at a time and therefore sit below the threshold regardless of connection count. Range scans issue many I/Os per query and can cross the threshold at modest connection counts.

Implications for system design¶

Don't assume io_uring is always faster. Benchmark the specific workload + concurrency + storage combination.
The io_method=worker hybrid is a deliberate middle ground. Farming I/O out to worker processes distributes both the I/O submission and the post-I/O CPU work; benefits at concurrency levels where io_uring still costs more than it saves.
Storage latency dominates at low concurrency. On network-attached storage, shave the latency floor (direct-attached NVMe) before reaching for async-I/O knobs.
Applications need to know their concurrency regime. OLTP backends servicing bursty single-connection traffic are below the threshold. Batch / analytics / streaming workloads tend to be above it.

concepts/postgres-async-io — concrete Postgres 18 instance where the threshold is measurable per-workload.
concepts/network-attached-storage-latency-penalty — the 5× EBS latency hop that sets the dominant floor in cloud deployments; async I/O only helps above it.
concepts/iops-throttle-network-storage — cloud-side throttle that independently caps the rate at which the threshold matters.
concepts/benchmark-methodology-bias — benchmarks that under-test high-concurrency regimes will miss async I/O's payoff zone.

Seen in¶

— canonical wiki introduction. Postgres 18's io_uring mode underperforms at low concurrency and on network-attached storage, only winning on local NVMe at 50 connections with large range scans.