PATTERN

Background worker pool for async I/O¶

Problem¶

You want an application process to stop blocking on synchronous I/O, but the obvious answer (io_uring, libaio) has real downsides:

Kernel-interface-specific. io_uring is Linux-only and has a difficult security history — disabled by default in hardened sandbox / container contexts.
Keeps post-I/O CPU work in-process. If the caller must checksum / decompress / memcpy each buffer after read, that CPU cost is serialised per calling process, capping the benefit of async I/O.
Per-I/O overhead doesn't help at low concurrency. See concepts/async-io-concurrency-threshold.
Doesn't distribute across CPU cores automatically. A single calling process issuing async I/Os still runs post-I/O work on one core.

Pattern¶

Run a pool of dedicated background worker processes (or threads, depending on runtime) that perform I/O on behalf of the main application processes. When a backend needs data, it submits a request to the worker pool via shared memory / IPC and waits on a response. Workers pick up requests, perform the underlying synchronous I/O, and signal completion.

The key properties:

Worker-side I/O can be simple read() / write(), no special kernel interface required.
Worker processes run on different CPU cores. Post-I/O CPU work (checksums, memcpy, decompression) distributes across the pool naturally.
Worker count is tuneable at runtime via configuration.
Single-process API — the backend sees a future / promise shape even though the underlying mechanism is cross-process IPC, not in-process async.

Canonical wiki instance: Postgres 18 `io_method=worker`¶

Postgres 18 (September 2025) introduced the io_method option with three settings. worker was chosen as the new default — not io_uring, the more headline-grabbing option.

Dicken's framing in :

Using io_method=worker was a good choice as the new default. It comes with a lot of the "asynchronous" benefits of io_uring without relying on that specific kernel interface, and can be tuned by setting io_workers=X.

Postgres's io_workers defaults to 3. Each worker process receives I/O requests from backend processes via shared memory, issues the underlying OS read, and signals completion. From the backend's perspective the interface is async; from the OS's perspective each worker issues plain synchronous reads.

Why `worker` beats `io_uring` on Postgres's workload shape¶

Per Tomas Vondra's tuning blog cited by Dicken:

Index scans don't yet use AIO in Postgres 18.x, so B-tree- dominated OLTP workloads benefit from whatever the I/O method happens to be doing for sequential and bitmap scans — which is most benchmark-load.
Checksums + memcpy are CPU-bound and serial per Postgres backend. io_uring's async dispatch doesn't help because the backend is CPU-bound, not I/O-bound, after the read completes. worker distributes the CPU cost across worker processes.
Process-level parallelism is what Postgres needs. Postgres's process-per-backend architecture already uses separate processes for isolation; adding I/O workers is a natural extension.

PlanetScale's measured data: worker matches or beats io_uring on EBS-backed instances at all tested concurrency levels, and only loses on local NVMe at 50 connections with large range scans.

Variants¶

Thread pool instead of process pool. Runtimes without cheap forking (Java, Go, Rust) prefer a thread pool; the dispatch mechanism becomes in-process queue rather than shared-memory IPC.
Per-device worker pool. One worker per underlying storage device to avoid head-of-line blocking across devices.
Mixed modes. Some reads go through the worker pool, others stay synchronous; io_uring backfills at a third tier for high-concurrency regimes where the worker overhead starts to cap throughput.

Trade-offs¶

Pro: kernel-interface-agnostic. Works wherever synchronous read() works.
Pro: CPU distribution. Post-I/O work parallelises across worker CPUs.
Pro: tuneable pool size. Application-layer knob, not kernel parameter.
Pro: well-understood failure modes. Workers die like any other process and can be restarted.
Con: higher per-I/O overhead. IPC + scheduling cost per request. At very high I/O rates this exceeds io_uring's submission overhead.
Con: fixed worker count is a bottleneck. Too few workers = queueing; too many = context-switch overhead.
Con: cross-process memory copy. Data read by a worker must be delivered to the backend — typically via shared memory, but still not zero-cost.

When to use this pattern¶

The calling process is CPU-bound post-I/O. Checksumming, decompression, format parsing — CPU work parallelises across workers.
You need portability across kernel versions / operating systems.
You're already using a process-per-client architecture (Postgres, Apache prefork).
Your concurrency sits below io_uring's payoff threshold (see concepts/async-io-concurrency-threshold).

When not to use it¶

Ultra-high I/O rates where IPC overhead dominates (millions of IOPS per calling process). Go kernel-native async.
CPU-light workloads where the post-I/O path is nearly free and per-I/O overhead is the whole story.
Single-threaded event-loop architectures (Node.js, Redis) that would lose their simplicity gains from adding worker processes.

Seen in¶

— canonical wiki introduction. Postgres 18's io_method=worker is both the default and, per PlanetScale's measured benchmarks, the best-performing io_method on most tested configurations. Dicken's take: "worker comes with a lot of the 'asynchronous' benefits of io_uring without relying on that specific kernel interface."