PlanetScale — Benchmarking Postgres¶
Summary¶
Ben Dicken's companion / methodology disclosure to the PlanetScale for Postgres GA launch (2025-07-01). The post announces Telescope, PlanetScale's internal benchmarking tool for creating, running, and assessing database benchmarks, and commits to three public benchmarks — latency (repeated SELECT 1; round-trips from an in-region client), TPCC-like (Percona's sysbench-tpcc with TABLES=20, SCALE=250 producing a ~500 GB database), and OLTP read-only (sysbench oltp_read_only at 300 GB for top performers) — against a long list of other cloud Postgres providers (Amazon Aurora, Google AlloyDB, CrunchyData, Supabase, TigerData, Neon). The reference PlanetScale target is a i8g M-320 (4 vCPUs, 32 GB RAM, 937 GB NVMe) with a primary + 2 replicas across 3 AZs. Competitors are given single-instance databases matching or exceeding PlanetScale's primary's vCPU/RAM — and where RAM:CPU ratios are capped at 4:1 (Supabase, TigerData, Neon), competitors get double the CPU count to match RAM; price accounting factors in replica headcount to recover the availability posture. Benchmark machines live in-region (us-east-1 for most, us-central1 for Google), use AWS c6a.xlarge (4 vCPU, 8 GB) / GCP e2-standard-4 (4 vCPU, 16 GB) clients. Defaults preserved except connection limits / timeouts. Methodology-first framing: "The way benchmarking is used is often deceptive… every OLTP workload at every organization is unique and no single benchmark can capture the performance characteristics of all such databases." Four questions the post commits to answering: how fast to reach, how does it do under typical OLTP, how does it do under high read/write pressure, how does cost-per-unit-performance compare. Public invitation at benchmarks@planetscale.com for methodology / cost-calculation feedback. The post is disclosure, not results — actual latency / TPCC / OLTP numbers are deferred to linked results pages and the earlier 2025-10-14 Postgres 17 vs 18 post. Reproduction instructions are published for TPCC (500 GB) and OLTP (300 GB) so independent parties can rerun.
Key takeaways¶
-
Telescope is PlanetScale's internal benchmarking harness, now disclosed. "We built an internal tool, 'Telescope', to be our go-to tool for creating, running, and assessing benchmarks… We have used this as an internal tool to give our engineers quick feedback on the evolution of our product's performance as we built and tuned it. We decided to share our findings with the world, and give others the tools to reproduce them." First canonical wiki datum on Telescope — canonical new systems/telescope-planetscale system page. (Source: article §intro.)
-
Three benchmarks commit to three questions.
- Latency — 200 runs of
SELECT 1;from a same-region client; measures "How quickly can I reach my database?" - TPCC-like — Percona's
sysbench-tpccwithTABLES=20, SCALE=250(~500 GB); measures "How does the database perform under a typical OLTP load?" and answers the IOPS-cap / caching questions. -
OLTP read-only —
sysbench oltp_read_onlyat 300 GB; isolates read performance for top performers — "most OLTP workloads are 80+% reads." (Source: article §"Three benchmarks".) -
Reference PlanetScale target is an i8g M-320 (4 vCPU, 32 GB, 937 GB NVMe) with primary + 2 replicas across 3 AZs. "At PlanetScale, we give you a primary and two replicas spread across 3 availability zones (AZs) by default. Multi-AZ configurations are critical to have a highly-available database." Availability posture is load-bearing in the price comparison — single-instance competitor databases "to match the true capacity and availability of PlanetScale, each would also need to add replicas. We account for this when discussing pricing." Canonical wiki instance of "include competitor replicas in the price comparison to equalise availability posture." (Source: article §methodology.)
-
RAM:CPU ratio asymmetry handled by giving competitors more CPU. Aurora / AlloyDB / CrunchyData support 8:1 RAM:CPU (match exactly); Supabase / TigerData / Neon cap at 4:1, so PlanetScale "opted to match the RAM, giving them double the CPU count used by PlanetScale. This is an unfair advantage to them, but as you'll see, PlanetScale still significantly outperforms with less resources." Canonical wiki framing: match RAM first, let CPU asymmetry cut in the vendor's favour when ratios differ. (Source: article §methodology.)
-
IOPS configuration: match where the vendor exposes it, boost where possible. Aurora / Neon / AlloyDB don't expose IOPS configuration; Supabase / TigerData got "boosts to the default IOPS settings" — all to avoid straw-manning the competition. "All of the products we compared with use network-attached storage for the underlying drives." Another canonical wiki datapoint for concepts/network-attached-storage-latency-penalty — the competitor stack is uniformly network-attached-storage even where PlanetScale itself is on direct-attached NVMe. (Source: article §methodology.)
-
Benchmark-machine and same-region constraints. All AWS benchmarks from
c6a.xlarge(4 vCPU, 8 GB) inus-east-1; all GCP benchmarks frome2-standard-4(4 vCPU, 16 GB) inus-central1. "All Postgres configuration options are left at each platform's defaults. The one exception to this is modifications to connection limits and timeouts, which may be modified to facilitate benchmarking." AZ-level placement not guaranteed for TPCC / OLTP because not all platforms expose AZ selection — an explicit benchmark-methodology bias cap acknowledged in-post. Latency benchmark is AZ-aware where the platform supports it. (Source: article §"How we run the benchmarks".) -
Reproducibility is a first-class commitment. Published reproduction instructions at
/benchmarks/instructions/tpcc500gand/benchmarks/instructions/oltp300g. Methodology commitment: "In the interest of full transparency, we provide full details for how we conducted our benchmarking." Invitation for feedback: "We invite other vendors to provide feedback. If you see anything wrong in our benchmarking methodology, let us know atbenchmarks@planetscale.com." Canonical wiki instance of the new patterns/reproducible-benchmark-publication pattern — publish the harness, the configs, the invitation to feedback, ship the contact email. (Source: article §"Our methodology" + closing.) -
Methodology-voice acknowledges benchmarking's epistemic limits upfront. "The way benchmarking is used is often deceptive. This applies to all technologies, not just databases… benchmarking of any kind has its shortcomings. Every OLTP workload at every organization is unique and no single benchmark can capture the performance characteristics of all such databases. Data size, hot:cold ratios, QPS variability, schema structure, indexes, and 100 other factors determine the requirements of your relational database setup. You cannot look at a benchmark and know for sure that your workload will perform the same given all other factors are the same." Canonical wiki source for *"a benchmark tells you about the benchmark, not your workload." The post frames what the benchmarks answer (four explicit questions) and what they don't (your specific workload shape). (Source: article §"Benchmarks are imperfect but useful".)
Systems extracted¶
- systems/telescope-planetscale — internal benchmarking tool, canonically named and disclosed here for the first time.
- systems/planetscale-for-postgres — the subject of the
comparison; reference shape is
i8g M-320(4 vCPU, 32 GB, 937 GB NVMe) with primary + 2 replicas across 3 AZs. - systems/planetscale-metal — the storage substrate under
PlanetScale for Postgres; the
i8g M-320reference is the Metal-on-AWS shape. - systems/postgresql — the benchmarked engine across all providers.
- systems/sysbench —
sysbench oltp_read_only+ Percona'ssysbench-tpccLua scripts are the two non-latency workloads. - systems/aws-ec2 — the client-side load generator
(
c6a.xlarge) and competitor instance types. - systems/aws-ebs — the implicit network-attached-storage competitor fabric for most providers.
Concepts extracted¶
- concepts/price-performance-ratio — new canonical wiki page. The fourth benchmark question Dicken asks: "How much does it cost to achieve some bar of performance relative to other options?" Specifically framed as cost-per-unit-performance (QPS per dollar, TPS per dollar) and the consumer posture for treating vendor benchmarks responsibly — to equalise availability posture in the price column, include all replicas the competitor would need to reach the same availability as the reference product.
- concepts/benchmark-methodology-bias — extended with PlanetScale's acknowledged cap on AZ-placement control (not all vendors expose it) + RAM:CPU-ratio asymmetry (giving competitors more CPU "is an unfair advantage to them").
- concepts/benchmark-representativeness — extended with the explicit "every OLTP workload is unique" caveat — no single benchmark captures any specific workload; benchmarks answer specific questions (latency, TPS, IOPS-under-pressure, $/performance) but don't substitute for workload-specific testing.
- concepts/network-attached-storage-latency-penalty — extended with "All of the products we compared with use network-attached storage for the underlying drives" — the entire competitor set in Dicken's comparison is network-attached-storage, reinforcing Metal's architectural differentiation.
- concepts/iops-throttle-network-storage — extended with IOPS-configuration taxonomy across providers: Aurora / Neon / AlloyDB don't expose IOPS; Supabase / TigerData do and were boosted beyond defaults for the comparison.
Patterns extracted¶
- patterns/reproducible-benchmark-publication — new canonical wiki pattern. Publish the benchmark harness, exact configs, full instance specs, reproduction instructions, and an invitation for methodology feedback via a dedicated email address. Turns vendor benchmarks from marketing claims into auditable + falsifiable artifacts.
- patterns/workload-representative-benchmark-from-production — extended by explicit acknowledgment that benchmarks are proxies for real workloads, not substitutes.
- patterns/custom-benchmarking-harness — extended; Telescope is PlanetScale's internal custom harness sibling of Figma's afternoon-of-Go OpenSearch harness and Meta's DCPerf — same pattern at a different altitude (multi-vendor comparative OLTP benchmarking).
Operational numbers¶
- Latency benchmark: 200 runs of
SELECT 1;per configuration. - TPCC-like benchmark: Percona
sysbench-tpccwithTABLES=20, SCALE=250→ ~500 GB Postgres database. - OLTP read-only benchmark:
sysbench oltp_read_onlyat 300 GB. - Reference PlanetScale target:
i8gM-320instance type — 4 vCPUs, 32 GB RAM, 937 GB NVMe SSD. - Replica posture: primary + 2 replicas across 3 AZs (default).
- Benchmark clients:
- AWS:
c6a.xlarge(4 vCPU, 8 GB RAM) inus-east-1. - GCP:
e2-standard-4(4 vCPU, 16 GB RAM) inus-central1. - RAM:CPU competitor matching:
- 8:1 RAM:CPU (match exactly): Aurora, AlloyDB, CrunchyData.
- 4:1 RAM:CPU (match RAM, double CPU): Supabase, TigerData, Neon.
Caveats¶
- Methodology disclosure post, not results post. Latency / TPCC / OLTP numbers are deferred to linked results pages + the earlier 2025-10-14 Postgres 17 vs 18 post. This page exists to canonicalise how PlanetScale benchmarks, not what the benchmarks show.
- Vendor disclosure. PlanetScale has direct commercial interest in the benchmarks favouring its product; the methodology acknowledges this implicitly via the reproducibility commitment + the feedback invitation. The post's credibility rests on third parties reproducing the results with the published configs.
- Same-region placement. All benchmarks are in-region. Cross-region / cross-cloud latency is out of scope.
- No AZ guarantees for TPCC / OLTP because not all platforms expose AZ placement. Latency benchmark is AZ-aware where available.
- Postgres configuration left at platform defaults (except
connection limits + timeouts). Workloads that benefit from
platform-specific tuning (
shared_buffers,work_mem,effective_io_concurrency, async-I/O mode) aren't reflected — equal-footing choice but means the benchmark measures default vs default, not tuned vs tuned. - Single instance size (M-320 / 4 vCPU / 32 GB) tested; scaling characteristics across sizes not reported here.
- Comparator set is cloud-managed-Postgres vendors only; no self-managed Postgres baseline (bare-metal Postgres with hand-tuned local NVMe would likely be the raw-performance ceiling).
- Benchmarks answer four questions, not all questions. The post is explicit about what's not being measured — write contention under specific skew, long-running analytical queries, mixed OLTP + OLAP workloads, recovery-time metrics, backup-and-restore costs, migration behaviour, long-tail latency at p99.9+.
- TPS / QPS shape. TPCC and OLTP
--range_sizevalues, concurrency sweep, warm-up durations are referenced in-post but the specific matrix is deferred to the reproduction instructions.
Source¶
- Original: https://planetscale.com/blog/benchmarking-postgres
- Raw markdown:
raw/planetscale/2026-04-21-benchmarking-postgres-2637bd79.md
Related¶
- systems/planetscale-for-postgres — the subject
- systems/planetscale-metal — the storage substrate
- systems/telescope-planetscale — the harness
- systems/sysbench — the workload driver
- systems/postgresql — the benchmarked engine
- concepts/price-performance-ratio — canonical new concept
- concepts/benchmark-methodology-bias
- concepts/benchmark-representativeness
- concepts/network-attached-storage-latency-penalty
- concepts/iops-throttle-network-storage
- patterns/reproducible-benchmark-publication — canonical new pattern
- patterns/custom-benchmarking-harness
- patterns/workload-representative-benchmark-from-production
- sources/2025-07-01-planetscale-planetscale-for-postgres — the launch this post is the methodology for
- sources/2025-10-14-planetscale-benchmarking-postgres-17-vs-18 — the
io_methodbenchmarking sibling - sources/2025-03-13-planetscale-io-devices-and-latency — the storage-latency-floor argument underlying the performance results
- sources/2025-03-18-planetscale-the-real-failure-rate-of-ebs — the reliability argument pairs with this post's performance argument
- companies/planetscale