CONCEPT Cited by 1 source

Write throughput ceiling¶

Write throughput ceiling is the point at which a database primary can no longer absorb incoming writes at the rate the workload demands. It is one of the three canonical sharding-trigger signals — alongside data-size overflow of working-set RAM and read-throughput overflow of read-replica capacity.

Symptom chain¶

Berquist () canonicalises the symptom ordering. Replication lag is the leading indicator; IOPS saturation is the lagging indicator:

"When the primary is maxed on IOPS, writes will become less performant. Usually before that, however, replication lag becomes a problem. While there have been significant improvements in replication within MySQL clusters, there will always be a small amount of delay between the time the data is written to the primary and that same data is written to a replica."

The chain:

Replication lag rises. Replicas can't keep up with the primary's write rate. Read-your-writes queries routed to replicas see stale data; downstream consumers (CDC, analytics replicas, backup workers) fall behind.
Application-visible staleness / errors. "when replicas fall behind the primary, this can look like inconsistent or stale data to your users, and may also result in errors if your application expects to be able to read data that it has just written."
Primary IOPS saturation. The primary itself can no longer keep pace; write latency climbs.
Operational-task slowdowns. "Other database operations like schema changes and batch jobs will be slower as well." Schema migrations time out, backup throughput drops, replica bootstrap takes longer.

Why replication lag is the leading indicator¶

Replication is asynchronous (or semi-sync): a replica applies writes after the primary. When the primary's write rate exceeds single-threaded-replay capacity on the replica (common in MySQL because SQL_THREAD is single-threaded per source), the replica falls behind even when the primary's IOPS budget still has headroom. So replication lag degrades before primary saturation — a useful early-warning signal.

Mitigations — ordered by escalation¶

Vertical scaling — bigger primary, more IOPS headroom.
Faster substrate — direct-attached NVMe (PlanetScale Metal) has no artificial IOPS cap; network-attached storage (RDS, CloudSQL) hits it earlier. "With PlanetScale Metal, write throughput is significantly less of a concern … substantially higher IOPS." (Source: ).
Application-side write reduction — batching, buffering, write coalescing; read replicas do not help the primary absorb writes.
Write sharding via horizontal sharding — split the write rate across N primaries. Each shard absorbs roughly total_writes / N.

Note: read replicas do not raise the write ceiling. Adding replicas increases read capacity but every write still goes through the single primary. The only way to scale writes beyond a single primary is to remove the single-primary assumption — either via horizontal sharding or via a multi-primary system.

vs IOPS throttle on network storage: IOPS throttle is the substrate-level cap (e.g. EBS's provisioned IOPS); write-throughput ceiling is the database-level symptom that the substrate + replication topology can no longer absorb the workload. Substrates with higher IOPS headroom push the ceiling further out.
vs replication lag: replication lag is one symptom of the ceiling (the leading one). The ceiling is the underlying capacity limit.

Seen in¶

sources/2026-04-21-planetscale-increase-iops-and-throughput-with-sharding — Ben Dicken (2024-08-19) canonicalises the cost-cliff expression of the write-throughput ceiling on network-attached storage. Once aggregate write-workload IOPS approaches the gp3 ceiling (16k), the regime-shift to io1/io2 provisioned-IOPS produces a super-linear 11-13× cost jump for an 8× workload. The ceiling isn't a hard failure — it's an economic one. Dicken's mitigation is horizontal sharding: each shard's demand stays under the cheap-tier ceiling. Composes with Berquist's replication-lag-as- leading-indicator framing — sharding is the structural answer to all three ceiling dimensions (data size, write throughput, IOPS cost).
— Berquist's canonical framing: replication lag is the leading indicator, IOPS saturation the lagging; substrate choice (Metal vs RDS/CloudSQL) shifts the ceiling.

Write throughput ceiling¶

Symptom chain¶

Why replication lag is the leading indicator¶

Mitigations — ordered by escalation¶

Distinction from related concepts¶

Seen in¶

Related¶