PLANETSCALE 2022-09-01

PlanetScale — One million queries per second with MySQL¶

Summary¶

Jonah Berquist (PlanetScale, 2022-09-01) publishes a positioning benchmark that drives a Vitess-on-MySQL cluster to sustained >1 million queries per second against a Percona sysbench-tpcc workload on a single-tenant PlanetScale deployment. The post's two explicit goals are (1) demonstrate PlanetScale's ability to handle large query volumes at the million-QPS-order, and (2) demonstrate predictable scalability via horizontal scaling — adding shards increases throughput capacity. Its load-bearing contribution to the wiki is not new mechanism (Berquist's sibling posts cover vschema + the scaling ladder at depth) but empirical evidence for the linear-shard-count-to-throughput relationship with three disclosed datapoints: 16 shards → ~420k QPS, 32 shards → ~840k QPS, 40 shards → >1M QPS sustained over 5 minutes. The doubling of QPS from 16 → 32 shards is the canonical wiki demonstration that Vitess-style horizontal sharding delivers ~linear scaling with shard count when the workload is shard-key-aligned. A corollary datum: 40 shards is a valid configuration — Berquist explicitly disclaims the power-of-2 convention as not a hard requirement.

The post also canonicalises a saturation-signal diagnostic: within a single shard-count configuration, query-latency rise precedes the QPS ceiling. For the 16-shard run, "the QPS increase was greater between 1024 threads and 2048 threads than it was between 2048 threads and 4096 threads" — the QPS-per-thread derivative went negative while absolute QPS still grew, and VTGate p50 + especially p99 latency "spiking toward the end" signalled resource exhaustion before QPS outright plateaued. This makes p99 rising faster than p50 while QPS still climbs a named-and-dated diagnostic for "this shard count has run out of headroom — add shards". Canonical new wiki concept: concepts/latency-rises-before-throughput-ceiling.

Tier-3 clear via (1) the three new production-order QPS datapoints for Vitess (16-shard 420k, 32-shard 840k, 40-shard 1M+); (2) the linear-shard-count-to-throughput empirical demonstration; (3) the p99-before-QPS-ceiling saturation signal; (4) the non-power-of-2 shard count datum. Architecture density ~35% — the post is deliberately numbers-forward rather than mechanism-forward, with four embedded graphs doing the heavy lifting. Caveats flagged: single-tenant deployment with non-standard configuration tweaks (raised query + transaction timeouts) — the benchmark is a scalability demonstration, not a representative workload for a shared-tenant environment.

Key takeaways¶

Linear shard-count-to-throughput: 16 shards → 420k QPS, 32 shards → 840k QPS. Verbatim: "With 16 shards we maxed out around 420k QPS. With 32 shards we got up to 840k QPS." Canonical wiki datum: doubling the shard count approximately doubles the achievable QPS ceiling, across the 16 → 32 step of the scan. This is the empirical counterpart to the theoretical argument in Berquist's decision-framework post and the Dicken IOPS post. New concept: concepts/linear-shard-count-throughput-scaling — the property that horizontal sharding on Vitess is ~linear in shard count for shard-key-aligned workloads. Underlying cause: the sysbench-tpcc access pattern is keyed such that each transaction hits a single shard, so adding shards buys proportional aggregate capacity without scatter-gather overhead eroding the linearity.
1M QPS sustained for 5 minutes on 40 shards. Verbatim: "Since we had just over 800k QPS with 32 shards, we calculated that 40 shards would satisfy our 1M QPS requirement. When we spun this database up and ran our parallel sysbench clients against it, these were the results: over one million queries per second sustained over our 5 minute run." Canonical wiki datum: 40 shards / >1M QPS / 5-min sustained is a new PlanetScale-disclosed Vitess production-order number. The calculation is explicitly proportional: 32 shards * (1M / 840k) ≈ 38 shards, rounded up to 40 with headroom. This pairs with the sibling post's JD.com 35M QPS on Singles Day datum — JD.com's number is the ceiling, Berquist's is a clean reproducible-methodology benchmark against a named sysbench-tpcc workload.
Power-of-2 shard counts are a convenience, not a requirement. Verbatim: "It's important to note that, while we like powers of 2, this isn't a limitation, and we can use other shard counts. Since we had just over 800k QPS with 32 shards, we calculated that 40 shards would satisfy our 1M QPS requirement." Canonical wiki datum: Vitess supports arbitrary shard counts; the power-of-2 convention in most of the tutorial content (2 → 4 → 8 → 16 → 32 in this post's scan) is operator-friendly for range-based shard splits but not enforced by the substrate. The worked example is explicit: 40 shards configured and run to 1M QPS. This disambiguates a common misconception from reading only the tutorial content.
p99 latency spike precedes QPS plateau — the per-configuration saturation signal. Verbatim (for the 16-shard run): "we begin to see diminishing returns as we saturate the resources of each shard. This is noticeable above when the QPS increase was greater between 1024 threads and 2048 threads than it was between 2048 threads and 4096 threads. Similarly, in metrics from vtgate shown below, we see an increase in latency as we max out our throughput. This is particularly evident in our p99 latency." Canonical wiki concept: concepts/latency-rises-before-throughput-ceiling — the earliest-fired diagnostic signal that a shard-count configuration has reached its practical ceiling is p99 climbing faster than p50 while QPS is still rising, not QPS outright flattening. The derivative of QPS-per-thread going negative (each additional thread adds less than the prior one) fires before the absolute-QPS plateau. Two independent signals: VTGate-measured p50/p99 (client-visible latency) + QPS-per-thread efficiency (load-gen side). Operational implication: adding shards earlier is cheaper than waiting for outright plateau — by the time QPS flattens, the p99 has already been degrading for some time.
sysbench-tpcc as the benchmark workload. Verbatim: "We set up a PlanetScale database and started running some benchmarks with a common tpc-c sysbench workload. We weren't aiming for a rigorous academic benchmark here, but we wanted to use a well-known and realistic workload." Canonical wiki datum: the Percona sysbench-tpcc port (Percona-Lab/sysbench-tpcc) is the workload named for this PlanetScale benchmark. Pairs with the systems/sysbench page's catalog of Lua-scripted oltp_* workloads as the OLTP-shaped but non-academic option. Caveat disclosed by the author: "not a rigorous academic benchmark" — the claim is scalability demonstration, not transactional correctness audit. See also the Postgres benchmarking post which uses the same sysbench-tpcc at TABLES=20, SCALE=250 (~500 GB).
Scan methodology: shard-count × thread-count grid. The author started unsharded, then "created a vschema and began sharding. Because we like powers of 2, we started with 2 shards and began doubling our shard count for subsequent runs. For each level of sharding, we ran sysbench several times, with increasing numbers of threads." Canonical wiki datum: the two-axis scan (shards = {2, 4, 8, 16, 32}; threads = {256, 512, 1024, 2048, 4096}) is the canonical shape for demonstrating both scaling-with-shards and saturation-per-configuration in a single benchmark. At each shard count, the thread-count sweep characterises the saturation curve; across shard counts, the ceilings per curve demonstrate the linear-with-shards property. This is a cleaner methodology than single-point benchmarks that conflate the two axes.
Connections grow linearly with sysbench threads, tracked via VTGate. The embedded chart narration: "in the graphs below, which were run against a 16 shard database, you can see the increase in the number of sysbench threads reflected in the number of connections. As the number of threads increases, so does the throughput in queries per second." Canonical wiki datum: sysbench thread count ≈ VTGate-visible client connection count, because each sysbench thread holds a persistent connection to VTGate for the run. VTGate's routing-only / stateless design — the same property that enables the sources/2026-04-21-planetscale-one-million-connections|1M-connection benchmark — allows the connection-count axis and the throughput axis to scale independently. The post doesn't disclose connection-count numbers beyond this qualitative statement (the 1M-connection benchmark is the quantitative sibling).
Single-tenant enterprise deployment, with non-default configuration tweaks. Verbatim: "We ran this benchmark against a single-tenant environment, with levels of resources that we reserve for our enterprise customers. We also made a few non-standard configuration tweaks, including raising some query and transaction timeouts to accommodate this sysbench workload." Canonical wiki caveat: the 1M-QPS number is a capability demonstration, not a shared-tenant SLO. The timeout-raising disclosure is load-bearing — sysbench at 4096 threads × 40 shards generates transaction shapes that would trip the default timeouts, which in production would be a load-shedding signal rather than something to increase. The number is useful as a "the substrate can do this when you size it and tune it" upper bound, not a "expect this on a shared-tier default configuration" baseline.

Operational numbers¶

1,000,000+ QPS — peak QPS sustained for 5 minutes on the 40-shard Vitess cluster running Percona sysbench-tpcc.
~840,000 QPS — 32-shard peak.
~420,000 QPS — 16-shard peak.
40 shards — the configuration that crossed 1M QPS; calculated from 32 * (1M / 840k) ≈ 38, rounded up.
5 minutes — duration of the 1M-QPS run.
Shard counts scanned: 2, 4, 8, 16, 32, 40 (doubling + one non-power-of-2 final configuration).
Thread counts in each sweep: up to 4,096 (with 1024 → 2048 showing larger QPS gain than 2048 → 4096 at 16 shards — the canonical saturation signal).
Single-tenant deployment with non-standard query + transaction timeout increases to accommodate the workload.

Caveats¶

Benchmark, not production workload. Author explicitly disclaims academic rigor; sysbench-tpcc is a well-known OLTP-shaped workload but is not representative of any real application. The 1M-QPS figure is a substrate-capability demonstration against a shard-key-aligned workload where each transaction hits a single shard — workloads with scatter-gather patterns or cross-shard transactions would see different scaling shapes.
Single-tenant sizing; enterprise-customer resource reservation. The benchmark cluster is sized specifically for this demonstration, not a typical multi-tenant deployment. Operators reading this should not infer that a default PlanetScale cluster delivers 1M QPS; they should read it as the substrate scales linearly with shards when you pay for the underlying capacity.
Non-default query + transaction timeouts. Disclosed inline — some timeouts were raised. In a production deployment, hitting those timeouts would be a load-shedding signal, not a tuning-knob. The benchmark traded timeout-triggered load-shedding for a clean QPS curve to measure.
No disclosure of underlying instance shape, shard-key, or transaction mix. The post doesn't specify the per-shard MySQL instance size, the shard-key for the sysbench-tpcc tables, or the NewOrder/Payment/etc. transaction mix. Reproducibility is gated on readers using the Percona sysbench-tpcc defaults and inferring the PlanetScale side. Contrast the sibling Postgres benchmarking post which publishes full reproduction instructions.
First in a series; later posts promised. Author flags: "This is the first in a series of PlanetScale benchmark posts. Stay tuned for more" + "We will have more benchmark posts coming and have partnered with an academic institution who will be releasing their work soon." Canonical wiki disposition: treat this post as the datum-disclosure post (1M QPS at 40 shards, linear-with-shards on the 16/32 step) and watch for downstream academic-partnership publication for the rigorous version.

PlanetScale — One million queries per second with MySQL¶

Summary¶

Key takeaways¶

Operational numbers¶

Caveats¶

Source¶

Related¶