Skip to content

PATTERN

Read replicas for read scaling

Pattern

When a single primary can no longer serve the read workload, add read replicas and split reads from writes at the application layer — writes to the primary, eventually-consistent reads to the replica pool. This is the canonical second rung of the database scaling ladder, and the cheapest incremental lever for read-heavy workloads.

"A tried-and-true method for scaling MySQL or Postgres is using replicas for reads. In addition to setting up the replicas, this involves application changes to split reads and writes to different connection strings. Most web applications are very read-heavy, and this method allows you to continue scaling reads by adding more replicas." ()

When to apply

  • Read-heavy workload — typical web-app 10:1 or higher read-write ratio.
  • Single-primary write rate is within capacity — the primary can still absorb writes; only read capacity is constrained.
  • Workload tolerates small staleness — the read paths that matter most can accept replication-lag-bound staleness, or the paths that can't (read-your-writes-critical flows) are a clear minority and can be pinned to the primary.

Mechanics

  1. Provision replicas. MySQL: CHANGE MASTER TO + binlog replication. Postgres: streaming replication with pg_basebackup + primary_conninfo. Replica count sized by read QPS / per-replica capacity.
  2. Expose two connection strings. A primary URL for writes, a replica pool URL (load-balanced) for reads.
  3. Route at the app layer. Each read site picks: primary (strongly consistent) or replica (eventually consistent). Common routing rules:
  4. After a write in the same request, read-your-writes-critical reads → primary.
  5. Navigation / browsing / analytics-style reads → replica pool.
  6. Long-running batch reads → dedicated analytics replica.
  7. Monitor replication lag. Surface per-replica lag; alert on threshold breach. Some apps actively pull a replica out of rotation if its lag exceeds a ceiling.

Structural costs

  • Replication-lag-visible staleness — every read-path query site must either tolerate it or opt into primary reads.
  • App-level routing logic — often proliferates; routing decisions get made at every query site.
  • Two connection pools — primary pool (writes), replica pool (reads); failover logic must handle primary promotion.
  • Does not scale writes — every write still goes to the single primary. The pattern solves only the read-capacity problem.

When to climb to the next rung

Per Berquist's canonical framing, consider horizontal sharding when:

  • The routing-logic cost is already being paid — every new read path requires primary-vs-replica decision; migrating that same routing logic to shard-key routing later may be cheaper than paying twice.
  • Write throughput is approaching the ceiling — read replicas do nothing for write capacity; foresee this trigger now.
  • Data size + read throughput both growing — sharding horizontally addresses both; more replicas addresses only reads.

"by scaling read capacity through horizontal sharding instead of by using replicas, application code does not need to account for the potential replication lag or that multiple connection strings need to be managed … Plus, sharding at this stage sets you up for future growth and you don't have to come back and shard later when write throughput or data size would otherwise become an issue." ()

Multi-region extension

The same pattern — read from replica pool, write to primary — extends across regions: each regional read replica becomes a pool-of-one in that region. patterns/per-region-read-replica-routing covers the application-side routing choice (env-var keyed to pod region → per-region connection string). The RYW problem gets sharper — cross-region replication lag is wider-tailed than same-region — and typically requires pairing with patterns/session-cookie-for-read-your-writes to preserve the user's own-write visibility. Canonical productised instance: systems/planetscale-portals. Launch post measures the latency benefit at ~90 ms → ~3 ms per query for a Frankfurt app that previously dialed a Northern-Virginia primary ().

Seen in

  • — Brian Morrison II (PlanetScale, 2023-11-15) canonicalises the active/passive-framing of the pattern: "the replicas can be used to serve up read-only queries, but all writes must be sent to the source. This helps split the load across all replicas, but it is important to note that when using the default asynchronous replication mode …, there may be some delay between when data is written to the source and when it is available on the replica." Pairs the pattern with the explicit contra-option ([[concepts/active-active-replication |active/active]]) warned against for MySQL due to absence of native conflict resolution. Also canonicalises the composition with mixed sync + async topology: one semi-sync replica as guaranteed failover candidate + async replicas for read capacity — read-replicas serve double-duty as scaling lever and as failover candidate pool. Under the async-across-regions rule, cross-region read replicas are async-only, so they serve read scaling with higher lag than in-region async replicas.

  • — Berquist's canonical "tried-and-true method for scaling MySQL or Postgres" framing and its structural-cost enumeration.

  • — Barnett extends the pattern across regions with PlanetScale Portals; motivates the multi-region deployment shape and pairs it with session-cookie RYW preservation.
Last updated · 542 distilled / 1,571 read