Skip to content

PLANETSCALE 2025-10-14

Read original ↗

PlanetScale — Benchmarking Postgres 17 vs 18

Summary

PlanetScale's Ben Dicken benchmarks Postgres 17 against Postgres 18 (the September 2025 release that introduced the io_method configuration option) across 96 read-only sysbench runs on four EC2 configurations — three EBS-backed r7i.2xlarge variants (gp3 3k-IOPS/125 MB/s default, gp3 10k-IOPS/500 MB/s provisioned, io2 16k-IOPS) plus one local-NVMe i7i.2xlarge (300k IOPS). The benchmark uses a ~300 GB database (100 tables × 13M rows, far exceeding the 64 GB RAM) under oltp_read_only with --range_size values of 100 (point-select heavy) and 10,000 (range-scan heavy), at 1 / 10 / 50 concurrent connections. Three io_method settings are exercised on Postgres 18: sync (legacy synchronous I/O, matches Postgres 17 behavior), worker (the new default — dedicated background worker processes handle all I/O), and io_uring (Linux's async-I/O interface). The expected headline — that io_uring would dominate — does not hold. On network-attached storage at low concurrency, Postgres 18 in sync and worker modes outperform Postgres 17 and Postgres 18 with io_uring; only at 50 connections with --range_size=10000 on the local-NVMe i7i does io_uring eke out a small win over the other options. Across all EBS-backed scenarios, IOPS / throughput caps are the bottleneck, and Postgres 18 worker takes a small lead over Postgres 17 but by a slim margin. The price-performance winner is the i7i local-NVMe instance ($551.15/mo) — more storage (1.8 TB vs 700 GB), no IOPS cap, and sub-ms I/O — significantly cheaper than io2 ($1,513.82/mo) and modestly more than gp3 with 10k provisioned IOPS ($492.32/mo). Dicken cites Tomas Vondra's tuning blog for the architectural reasons worker can beat io_uring on many workloads: index scans don't yet use AIO (so there's nothing to asynchronise for B-tree-dominant paths); checksums / memcpy remain synchronous so the CPU is bottlenecked even when the I/O is async; and workers provides parallelism from a single-process perspective in ways io_uring's same-process queueing doesn't. The post's key reframing is that io_method=worker as the new default was a sound choice: it captures most of the asynchrony benefits of io_uring without requiring a specific kernel interface, and it can be tuned via io_workers=X. Published as the sequel / methodology disclosure to PlanetScale's 2025-03-13 IO devices and latency post and the 2025-07-01 PlanetScale for Postgres launch announcement.

Key takeaways

  1. Postgres 18 introduces the io_method configuration knob. "Postgres 18 introduces the io_method configuration option, allowing users more control over how disk I/O is handled. Setting this to sync results in the same behavior as 17 and earlier versions. With this, all I/O happens via synchronous requests. 18 introduces two alternatives: worker and io_uring." worker is the new default. Canonical wiki instance of concepts/postgres-async-io as a new Postgres 18 feature with three operational modes. (Source: article §"Postgres 18".)

  2. io_uring did not dominate as expected on EBS at low concurrency. On gp3-3k / gp3-10k / io2 with a single connection and --range_size=100, Postgres 18 in sync and worker modes beat Postgres 17 and Postgres 18 with io_uring. Dicken: "I'll admit, this surprised me! My expectation was that io_uring would perform as well as if not better than all these options." The latency of network-attached storage (gp3 ~250 μs round-trip, even io2) is the dominant cost — async-I/O reordering doesn't help when the per-I/O floor is set by the network hop. Canonical wiki instance of concepts/async-io-concurrency-threshold — async I/O only pays off above a certain concurrency / I/O-rate threshold. (Source: article §"How do the results compare?".)

  3. Local NVMe dominates every scenario tested. The i7i.2xlarge with 1.8 TB local NVMe consistently outperforms every EBS-backed instance, sometimes by a large margin. Dicken: "Local disks are the clear winner. When you have low-latency I/O and immense IOPS, the rest matters less." This is the empirical backing for the [[concepts/network-attached-storage-latency-penalty|5× latency penalty]] argument made in the 2025-03-13 IO-devices post — local NVMe's ~50 μs vs EBS's ~250 μs is worth more QPS per dollar than any amount of io_method tuning on EBS. Positions PlanetScale Metal as the architectural winner. (Source: article §"How do the results compare?" + "Cost".)

  4. IOPS and throughput caps dominate at high concurrency on EBS. At 50 connections with --range_size=100, "IOPS and throughput are clear bottlenecks for each of the EBS-backed instances. The different versions / I/O settings don't make a huge difference in such cases." As EBS capability scales (gp3-3k → gp3-10k → io2-16k → NVMe-300k), QPS scales in lockstep. Canonical wiki datum for concepts/iops-throttle-network-storage at the measured-QPS altitude (prior wiki instance covered the config-side cap; this is the workload-side manifestation). (Source: article §"Higher level of concurrency, small range scans".)

  5. io_uring only wins at high concurrency + CPU-bound scans on local NVMe. At 50 connections with --range_size=10000 on the i7i instance: "we finally have a scenario where io_uring wins! On the NVMe instance, it slightly outperforms the other options." The 10-connection graph at --range_size=100 shows io_uring significantly worse on gp3-3k but only slightly worse at 50 connections. Dicken's inferred rule: "io_uring performs well when there's lots of I/O concurrency, but in low-concurrency scenarios it isn't as beneficial." Canonical load-bearing datum for concepts/async-io-concurrency-threshold. (Source: article §"10 connections" and §"How do the results compare?".)

  6. Price-performance winner is the local-NVMe instance. Monthly on-demand pricing: r7i + gp3-3k $442.32; r7i + gp3-10k $492.32; r7i + io2-16k $1,513.82; i7i local NVMe (1.8 TB, no EBS) $551.15. The i7i has ~2.5× the storage of the r7i + gp3 variants and no artificial IOPS cap. "The server with a local NVMe disk is the clear price-performance winner." Reinforces the Metal cost-performance thesis with vendor-agnostic EC2 pricing. (Source: article §"Cost".)

  7. Vondra's tuning analysis: why workers beats io_uring on many workloads. Dicken cites Tomas Vondra's 2025 blog for three architectural reasons:

  8. Index scans don't (yet) use AIO — B-tree navigation remains synchronous, so workloads dominated by indexed lookups (which is most OLTP) don't benefit from io_uring's async-read facility.
  9. Checksums / memcpy remain CPU-bound — even when the disk I/O happens asynchronously, the post-read work is synchronous and serial per-process, capping benefit.
  10. workers provides better parallelism from a single-process perspective — farming I/Os out to dedicated worker processes achieves parallelism that io_uring's same-process async queue doesn't. Canonical wiki datum for patterns/background-worker-pool-for-async-io as a deliberate design choice over io_uring when most of the benefit is from process-level I/O parallelism, not per-I/O async-dispatch reduction. (Source: article §"Some ideas why?" citing Vondra.)

  11. Benchmark scope is explicitly read-only. oltp_read_only only exercises point selects + range scans + aggregations. "The io_uring improvements only apply to reads, so the focus here will be on the oltp_read_only benchmark." Dicken flags this as a limitation: "this is a very specific type of workload … io_uring surely has other workloads where it would shine. It's also possible that with different postgresql.conf tunings, we'd see improvements from io_uring." Canonical wiki instance of concepts/benchmark-methodology-bias — reported by the author themselves, not lint against. (Source: article §"Some ideas why?".)

Architectural numbers

Instance vCPUs RAM Disk Disk type IOPS Throughput $/mo
r7i.2xlarge 8 64 GB 700 GB gp3 3,000 125 MB/s $442.32
r7i.2xlarge 8 64 GB 700 GB gp3 10,000 500 MB/s $492.32
r7i.2xlarge 8 64 GB 700 GB io2 16,000 $1,513.82
i7i.2xlarge 8 64 GB 1,875 GB local NVMe 300,000 $551.15

Benchmark matrix: 4 instance configs × 2 --range_size values (100, 10,000) × 3 concurrency levels (1, 10, 50 connections) × 4 Postgres configurations (17, 18-sync, 18-worker, 18-io_uring) = 96 × 5 min = 480 min of benchmark runtime. Each server pre-warmed with 10 minutes of query load before the 5-minute measurement window.

Database shape: TABLES=100, SCALE=13000000 → ~300 GB total, 4.6× RAM size, forcing real disk reads throughout the run.

Postgres tuning (common across configs):

shared_buffers        = 16GB    # 25% RAM
effective_cache_size  = 48GB    # 75% RAM
work_mem              = 64MB
maintenance_work_mem  = 2GB
random_page_cost      = 1.1
effective_io_concurrency = 200
max_parallel_workers  = 8
io_workers            = 3       # default

Systems named

  • PostgreSQL — subject of the benchmark; v17 vs v18.
  • Linux io_uring — kernel async-I/O interface; one of Postgres 18's three io_method options.
  • systems/sysbenchakopytov/sysbench benchmark tool; oltp_read_only mode.
  • Amazon EBS — gp3 + io2 backing substrate; canonical wiki-referenced cloud block-storage.
  • NVMe SSD — direct-attached storage on the i7i instance type.
  • Amazon EC2r7i.2xlarge + i7i.2xlarge host instances.
  • PlanetScale Metal — the architectural-successor framing; not benchmarked directly but the i7i winning result is the empirical backing for Metal.

Concepts introduced

Patterns introduced

  • patterns/background-worker-pool-for-async-io — Postgres 18's io_method=worker design: dedicated worker processes handle I/O; achieves process-level parallelism without requiring a specific kernel async interface (io_uring).

Caveats

  • Read-only workload only. oltp_read_only does not test writes, mixed workloads, WAL fsync pressure, or recovery. io_uring's write and mixed-read/write paths may show different behaviour; the post acknowledges this explicitly.
  • Fixed tuning per config. The same postgresql.conf is used across all four Postgres configurations. Dicken notes: "with different postgresql.conf tunings, we'd see improvements from io_uring." In particular io_workers=X and effective_io_concurrency=N are left at defaults; Vondra's companion post details the tuning surface.
  • Single instance shape. Only r7i.2xlarge + i7i.2xlarge are tested; larger NUMA-bound instances, bigger RAM:data ratios, or ARM-based Graviton instances may move results.
  • Local NVMe vs EBS is a two-axis comparison. The i7i wins on both latency and IOPS cap; the post doesn't isolate which axis dominates the win.
  • Five-minute measurement window. Tail-latency effects that emerge at longer horizons (SSD GC, EBS burst-bucket depletion on gp3, io2 warmup) are not captured.
  • No p99/p99.9 reporting. Only average QPS is graphed. Tail- latency comparison between sync / worker / io_uring is absent from the disclosure.
  • Cached workload not tested. Working-set-in-RAM scenarios are exactly where io_method matters least and where Postgres 18's improvements on CPU code paths might dominate — not tested here.
  • Vendor posture. PlanetScale is the publisher and has a direct commercial interest in the local-NVMe-beats-EBS conclusion (systems/planetscale-metal). The underlying numbers are reproducible via sysbench (tunables linked in article appendix) but the framing is PlanetScale's.

Source

Last updated · 319 distilled / 1,201 read