Skip to content

CONCEPT Cited by 1 source

Single-shard throughput ceiling

The single-shard throughput ceiling is the per-shard operation rate a sharded system can sustain before saturating that one shard, independent of the rest of the fleet. For systems that serialise updates per shard, the ceiling is latency-bound, not bandwidth-bound: the inverse of the persistence-layer operation latency. "The latency of a database operation limits the maximum theoretical throughput of a single shard" (Source: sources/2026-04-21-planetscale-temporal-workflows-at-scale-sharding-in-production).

The formula

For a shard with serialised updates:

single_shard_throughput_ceiling ≈ 1 / persistence_latency

If a single persistence-layer write takes 5 ms, the shard ceiling is ~200 operations/sec — regardless of whether the backing MySQL primary has 4 cores or 64, 3,000 IOPS or 30,000. You cannot add capacity to one shard; you must add shards.

Why capacity planning inverts

In a conventional sharded OLTP database, shard capacity scales with the backing instance's CPU/IOPS/bandwidth, and per-shard throughput can rise multiple orders of magnitude by upsizing the instance. Operators size for "how many shards at instance class X"; instance class is a tunable.

In a serialised-per-shard system (Temporal is the canonical example), the equation inverts:

  1. Measure the persistence-layer operation latency under your load shape (p50, p95, p99).
  2. Derive the per-shard throughput ceiling from the latency.
  3. Compute the number of shards needed to exceed your target aggregate throughput.
  4. Over-provision because many sharded-system shard counts are immutable after deploy — you cannot reshard your way out of an undersized count.

Instance upsizing helps reduce latency (fewer outliers, tighter p99), but each unit of latency reduction yields linear throughput improvement only until latency floors hit (round-trip network time, fsync cost, transaction commit overhead). Past that, the only knob is shard count.

Contrast: bandwidth-bound vs latency-bound

Bound Scaling lever Example system
Bandwidth (concurrency-parallel writes) Bigger instance, more IOPS, more cores Vanilla MySQL primary serving many concurrent clients
Latency (serialised writes) More shards, lower per-op latency Temporal History shard

The critical architectural question: does my system hold a per-shard serialisation lock? If yes, you're in the latency-bound regime; instance upsizing has rapidly-diminishing returns past the latency floor.

Relationship to concepts/latency-rises-before-throughput-ceiling

On a bandwidth-bound shard, p99 latency rises sharply as the shard approaches its throughput ceiling (queue theory: queue length grows with utilisation ρ, blowing up as ρ → 1). On a latency-bound shard, the ceiling is the latency — there is no queuing headroom because every operation pays the full serialised cost. The wiki's latency-rises-first diagnostic still applies at a fleet aggregation altitude (p99 across all shards rises before total QPS plateaus), but the mechanism is different: you're seeing hot shards saturating their individual latency-bound ceilings, not bandwidth contention.

Seen in

Last updated · 550 distilled / 1,221 read