SYSTEM Cited by 1 source
Linux io_uring¶
What it is¶
io_uring is the Linux kernel's asynchronous I/O interface
(merged in Linux 5.1, May 2019), designed to replace the earlier
aio / libaio interface. It exposes two shared-memory ring buffers
between user space and the kernel — a submission queue (SQ) and
a completion queue (CQ) — so that I/O requests can be submitted
and completed without a syscall per operation. A single process
can have thousands of in-flight I/Os while avoiding both the
per-operation syscall overhead and the synchronous blocking of
read/write semantics.
Why it matters¶
Traditional POSIX read() / write() calls are synchronous —
the calling thread blocks until the I/O completes. Applications that
want concurrency use either:
- Many threads, each blocking on its own I/O (thread-per-I/O, scales poorly past thousands).
- Non-blocking I/O +
epoll, which works for sockets but not for disk reads (Linux disk I/O has no fully non-blocking mode pre-io_uring). - POSIX
aio, which Linux implements poorly — a user-space thread pool wrapping synchronous calls, not true kernel async.
io_uring is the first kernel-native async disk I/O on Linux.
Applications submit batches of I/Os to the SQ, call io_uring_enter
once, and later reap completions from the CQ — potentially with zero
additional syscalls if submission- and completion-queue polling
threads are configured.
How Postgres 18 uses it¶
Postgres 18's new io_method knob
exposes io_uring as one of three options (alongside sync and
worker, the new default). Setting io_method=io_uring causes
Postgres to issue read requests via the io_uring interface,
allowing the kernel to service multiple outstanding I/Os per process
without thread-switching overhead.
The catch (from PlanetScale's benchmarks + Tomas Vondra's tuning blog):
- Index scans don't yet use AIO. The B-tree-navigation paths
— which dominate most OLTP — remain synchronous.
io_uring's async-read benefit only applies to sequential / range scans. - Post-I/O work is still synchronous. Even when reads happen
in the background, Postgres must checksum pages and
memcpythem into the shared-buffer pool — these are CPU-bound and serial per-process. - Only reads are async in Postgres 18. Writes (including WAL
fsync) still use synchronous paths.
io_uringsupports async writes, but Postgres 18 has not yet adopted them.
The practical result, per PlanetScale's data: io_uring only
outperforms sync / worker at high concurrency + large
range scans on local NVMe — the scenario where async I/O
parallelism is load-bearing and the per-I/O latency floor doesn't
dominate.
io_uring vs the worker pool¶
Postgres 18's io_method=worker — the new default — takes a
different design path to the same goal: dedicated background
worker processes handle I/O instead of using io_uring. See
patterns/background-worker-pool-for-async-io. The trade-off:
io_uringhas lower per-I/O overhead (no cross-process context switch) but requires a specific kernel interface and keeps post-I/O CPU work in the same process.workerhas higher per-I/O overhead (IPC + scheduling) but distributes CPU work across processes, which for many workloads matters more than shaving per-I/O latency.
Dicken's measured outcome: worker matches or beats io_uring on
EBS-backed storage at all concurrency levels and only loses on
local NVMe at 50 connections with large scans.
Applications using io_uring in production¶
- Postgres 18 (2025, one of three
io_methodoptions). - RocksDB — optional
io_uring-backedFileSystemfor read-heavy workloads. - ScyllaDB — full replacement for POSIX I/O in the Seastar reactor.
- QEMU — block-device I/O backend (
aio=io_uring). - fio — the canonical disk benchmark has an
io_uringengine. - liburing — the user-space helper library maintained by Jens Axboe (the kernel interface author).
Security posture¶
io_uring has had a checkered security history — multiple
kernel CVEs since 2020 and, as of 2023, Google disabled io_uring
on Android and ChromeOS, and several container runtimes block it
via seccomp by default. Postgres 18's default (io_method=worker)
sidesteps this by not requiring io_uring to be enabled.
Seen in¶
- sources/2025-10-14-planetscale-benchmarking-postgres-17-vs-18
— canonical wiki introduction. Postgres 18's
io_method=io_uringbenchmarked againstsyncandworker; loses on EBS at low concurrency; only wins at 50 connections +--range_size=10000on the local-NVMe i7i instance. Vondra's three architectural reasons for whyworkerbeatsio_uringon most workloads are cited and canonicalised.