CONCEPT Cited by 1 source
Single-shard throughput ceiling¶
The single-shard throughput ceiling is the per-shard operation rate a sharded system can sustain before saturating that one shard, independent of the rest of the fleet. For systems that serialise updates per shard, the ceiling is latency-bound, not bandwidth-bound: the inverse of the persistence-layer operation latency. "The latency of a database operation limits the maximum theoretical throughput of a single shard" (Source: sources/2026-04-21-planetscale-temporal-workflows-at-scale-sharding-in-production).
The formula¶
For a shard with serialised updates:
If a single persistence-layer write takes 5 ms, the shard ceiling is ~200 operations/sec — regardless of whether the backing MySQL primary has 4 cores or 64, 3,000 IOPS or 30,000. You cannot add capacity to one shard; you must add shards.
Why capacity planning inverts¶
In a conventional sharded OLTP database, shard capacity scales with the backing instance's CPU/IOPS/bandwidth, and per-shard throughput can rise multiple orders of magnitude by upsizing the instance. Operators size for "how many shards at instance class X"; instance class is a tunable.
In a serialised-per-shard system (Temporal is the canonical example), the equation inverts:
- Measure the persistence-layer operation latency under your load shape (p50, p95, p99).
- Derive the per-shard throughput ceiling from the latency.
- Compute the number of shards needed to exceed your target aggregate throughput.
- Over-provision because many sharded-system shard counts are immutable after deploy — you cannot reshard your way out of an undersized count.
Instance upsizing helps reduce latency (fewer outliers, tighter p99), but each unit of latency reduction yields linear throughput improvement only until latency floors hit (round-trip network time, fsync cost, transaction commit overhead). Past that, the only knob is shard count.
Contrast: bandwidth-bound vs latency-bound¶
| Bound | Scaling lever | Example system |
|---|---|---|
| Bandwidth (concurrency-parallel writes) | Bigger instance, more IOPS, more cores | Vanilla MySQL primary serving many concurrent clients |
| Latency (serialised writes) | More shards, lower per-op latency | Temporal History shard |
The critical architectural question: does my system hold a per-shard serialisation lock? If yes, you're in the latency-bound regime; instance upsizing has rapidly-diminishing returns past the latency floor.
Relationship to concepts/latency-rises-before-throughput-ceiling¶
On a bandwidth-bound shard, p99 latency rises sharply as the shard approaches its throughput ceiling (queue theory: queue length grows with utilisation ρ, blowing up as ρ → 1). On a latency-bound shard, the ceiling is the latency — there is no queuing headroom because every operation pays the full serialised cost. The wiki's latency-rises-first diagnostic still applies at a fleet aggregation altitude (p99 across all shards rises before total QPS plateaus), but the mechanism is different: you're seeing hot shards saturating their individual latency-bound ceilings, not bandwidth contention.
Seen in¶
- sources/2026-04-21-planetscale-temporal-workflows-at-scale-sharding-in-production — Longoria names the latency-bound ceiling as the correctness consequence of Temporal's serialise-per-shard discipline, motivating the hard requirement to size
numHistoryShardshigh enough for worst-case peak load before the cluster is deployed (see concepts/num-history-shards-immutability).