Skip to content

CONCEPT Cited by 1 source

Linear shard-count throughput scaling

Linear shard-count throughput scaling is the empirical property that doubling the shard count approximately doubles the QPS ceiling of a horizontally-sharded cluster, provided the workload is shard-key-aligned — every transaction routes to a single shard and no scatter-gather or cross-shard query erodes the per-shard capacity.

Canonical datum

Jonah Berquist, PlanetScale, 2022-09-01 (Source: sources/2026-04-21-planetscale-one-million-queries-per-second-with-mysql) ran a Percona sysbench-tpcc workload against a Vitess-on-MySQL cluster, scanning shard counts in powers of two:

Shard count Peak QPS Ratio vs prior
16 ~420k
32 ~840k 2.0×
40 >1,000k 1.19× (vs 32)

"With 16 shards we maxed out around 420k QPS. With 32 shards we got up to 840k QPS."

The 32 → 40 step is also linear-in-shard-count: 840k * (40/32) = 1.05M, matching the observed >1M result. The author's explicit calculation: "Since we had just over 800k QPS with 32 shards, we calculated that 40 shards would satisfy our 1M QPS requirement."

Why it holds

Each shard is an independent MySQL instance with its own CPU, memory, storage IOPS, and connection pool. A shard-key-aligned transaction touches exactly one shard — its cost is bounded by per-shard capacity, and the cluster's aggregate capacity is the sum across shards. Adding a shard adds one unit of per-shard capacity to the total.

The load-generator side is also linear: sysbench's sysbench-tpcc client issues independent transactions per connection, so adding threads adds load proportional to the thread count up to the cluster ceiling. VTGate's routing-only design ensures the proxy layer doesn't become the bottleneck for reasonable cluster sizes.

When it breaks

  • Scatter-gather workloads. A WHERE shard_key BETWEEN a AND b hash-sharded query fans out to every shard. Adding shards under this workload does not increase throughput proportionally — each query still touches every shard, so per-query cost grows with shard count while per-query capacity stays constant per shard. See concepts/scatter-gather-query.
  • Cross-shard transactions. Atomic multi-shard writes require 2PC or equivalent coordination. Adding shards increases coordination overhead. See concepts/atomic-distributed-transaction.
  • Hot shard / skewed shard key. If the shard key concentrates traffic on a subset of shards (e.g. a hot-shard write frontier from monotonic keys), adding shards doesn't help the hot shards. Linear scaling is an average-case property that assumes balanced distribution.
  • VTGate saturation. At extreme shard counts or connection counts, VTGate itself can become the bottleneck. The sources/2026-04-21-planetscale-one-million-connections|1M-connection benchmark characterises this ceiling separately.
  • Shard-count-dependent metadata cost. Topology server reads, cluster coordination, and schema-change fan-out all scale with shard count (though sub-linearly). These are fixed costs that eat into the scaling headroom at very large shard counts.

Shard counts are not constrained to powers of two

Berquist's 40-shard run is explicit: "while we like powers of 2, this isn't a limitation, and we can use other shard counts." Vitess supports arbitrary shard counts; the power-of-2 convention in most tutorial content is operator ergonomic (clean range-based splits when resharding 2 → 4 → 8 → 16) rather than a substrate requirement. The scan in this benchmark went 2 → 4 → 8 → 16 → 32 → 40 to target exactly the 1M-QPS throughput goal.

Companion signal: saturation within a single configuration

Linear scaling across shard counts pairs with concepts/latency-rises-before-throughput-ceiling as the saturation signal within a single shard-count configuration. Once p99 starts rising faster than p50 while QPS is still climbing, the configuration has reached its practical ceiling — adding shards (moving to the next point on the linear curve) is cheaper than continuing to push threads against a saturated configuration.

Seen in

  • sources/2026-04-21-planetscale-one-million-queries-per-second-with-mysql — Jonah Berquist (PlanetScale, 2022-09-01) canonicalises the property with a three-point scan: 16 shards = 420k QPS, 32 shards = 840k QPS, 40 shards = 1M+ QPS. The 16 → 32 step is a clean 2× on both axes — the canonical wiki instance of "doubling shards doubles QPS" for a shard-key-aligned Vitess workload. The 40-shard data point demonstrates that the property extends continuously, not only at power-of-2 boundaries. Caveat: single-tenant enterprise-sized deployment with non-default timeout tuning; the property is load-bearing, the absolute numbers are substrate-capability demonstration rather than shared-tenant SLO.
Last updated · 550 distilled / 1,221 read