PLANETSCALE 2023-11-15

PlanetScale — MySQL replication: Best practices and considerations¶

Summary¶

Brian Morrison II (PlanetScale, 2023-11-15, re-fetched via RSS 2026-04-21) publishes a best-practices field manual on configuring MySQL replication, framed at the practitioner altitude. Load-bearing claim up front: "At PlanetScale, we support hundreds of thousands of database clusters, all using replication to provide high availability, so we have a little bit of experience in this arena!" The post enumerates eight orthogonal configuration axes operators must decide on — topology (active/passive vs active/active), transaction identity (binlog-position vs GTIDs), replication mode (async vs semi-sync), mixed replication modes, binlog disk placement, monitoring, unplanned-failover procedure, cross-AZ vs cross-region geographic topology — and names PlanetScale's own choices on each to anchor the recommendations.

Canonical PlanetScale posture disclosed on three axes: (1) active/passive only, sharding for horizontal write scale rather than multi-writer MySQL: "We always recommend using an active/passive configuration for replication, and sharding if you need more throughput from your database." (2) Semi-sync with extremely high timeout: "PlanetScale actually uses semi-synchronous replication for our databases within a given region. … We set the timeout value extremely high to ensure that the data for our databases are always consistent." (3) Prometheus for replication monitoring: "At PlanetScale, we use Prometheus to monitor replication, along with other metrics, for the clusters we manage."

Architectural contributions canonicalised for the first time on the wiki:

Active/passive replication vs active/active replication as explicit named choices — previously implicit across the corpus (leader/follower pattern) without a dedicated definitional page for each side. The post's conflict-resolution argument against active/active MySQL is a durable canonical datum: "when using active/active replication, … conflicts can easily occur as there is no native conflict resolution logic within MySQL. When conflicts do occur, neither node can be considered the source of truth for a rebuild without significant data loss."
Separate binlog disk from data disk — a previously uncanonicalised operational hygiene primitive: "By default, these logs are stored on the same disk as the database. As you can imagine, busy databases can be bogged down when you consider the amount of throughput being processed by a single disk (manipulating the database + reading binary logs for replication). That said, the better approach would be to store binary logs on a separate disk than the database." Directly saves money on cloud-IOPS premium pricing — ties into the canonical concepts/ebs-iops-burst-bucket and concepts/throughput-vs-iops framings from Dicken's 2024-08-19 sharding-IOPS post.
Unplanned-failover playbook — canonical four-step procedure verbatim from the post: (1) fence the downed source so it doesn't come back online unexpectedly; (2) identify the replica to promote and unset read_only; (3) update application to direct queries to the new source; (4) re-point remaining replicas to the new source. The canonical wiki unplanned-failover procedure alongside the already-canonical graceful (planned) path via graceful leader demotion / Vitess PRS.
Mixed sync + async replication topology as a PlanetScale-recommended composite: "If you want to guarantee that one specific server always contains an up-to-date copy of your database, but also want additional replicas for more resiliency, you could configure one replica with semi-sync and one without. … In a disaster scenario, this can help you easily identify the best candidate to recover from." Canonicalises the heterogeneous-replica-guarantee trick — pay semi-sync latency once (to the one fastest replica that will always receive the write durably) and get extra async replicas free.
Async replication for cross-region, semi-sync only within region — explicit operational rule: "replicating across regions should be done in asynchronous mode so as to not cause unnecessary delay for the application making requests." AWS-cited cross-AZ latency ("single-digit millisecond latency between availability zones" per AWS docs) vs 60ms+ cross-region (us-east-1 ↔ us-west-1 via cloudping.co) is the canonical denominator that drives the rule.

Scope disposition: on-scope Tier-3 — practitioner-altitude best-practices post with five load-bearing operational rules backed by PlanetScale's own production posture on each. Architecture density ~80%; no production numbers but does anchor each rule to a concrete configuration choice with named trade-offs. Brian Morrison II is a previously-canonicalised PlanetScale named voice (2023-02-09 Postgres-to-MySQL migration + 2023 Declarative schema migrations + Declarative Atlas CLI + MySQL isolation levels). This is his fourth wiki ingest and the first to canonicalise a replication-topology best-practices field manual at his pedagogy voice.

Key takeaways¶

Active/passive is the canonical recommendation; active/active is warned against for MySQL. "We always recommend using an active/passive configuration for replication, and sharding if you need more throughput from your database." (Source: sources/2026-04-21-planetscale-mysql-replication-best-practices-and-considerations). Morrison's argument against active/active: "each server is processing the others query workload, making write distribution more of an illusion." And the hard kill-shot: "conflicts can easily occur as there is no native conflict resolution logic within MySQL. When conflicts do occur, neither node can be considered the source of truth for a rebuild without significant data loss." The canonical wiki framing: MySQL cannot resolve write-write conflicts automatically, so multi-writer topologies are structurally fragile — scaling writes is a sharding problem, not a replication problem.
GTIDs are the canonical replication-position primitive, replacing binlog file+offset. Morrison's framing: "By default, replicas will read the binary log file on a source database and track the processed records based on the position within that file. … This system is relatively fragile as issues can occur if the source crashes and the logs need to be restored. With GTIDs enabled, each transaction is assigned an ID so replicas can concretely determine if a transaction has been processed or not." Format verbatim: 14a54b2f-2ad0-43b6-b803-72b5d7151d3b:1 (single transaction) or 14a54b2f-2ad0-43b6-b803-72b5d7151d3b:1-10 (range). Each replica stores its GTID set in the gtid_executed table. Already canonicalised on the wiki via concepts/gtid-position + concepts/binlog-replication from the 2022-04 consensus-Part-6 + 2026-02 petabyte-scale-migrations + 2026-04 schema-reverts ingests; this post extends those pages with the best-practices-altitude rationale (binlog-position is fragile across crash + restore, GTIDs are portable).
Async vs semi-sync is the central replication-mode trade-off; PlanetScale uses semi-sync within a region. Morrison names both modes and draws the trade-off axis: "By default, MySQL will be configured with asynchronous replication. … There is no validation from the source that any replica in the environment processes the transaction." Semi-sync in contrast: "the source will wait until at least one replica accepts the transaction before responding to the caller. The benefit is that data consistency is greater since at least two database servers in your environment will have the data, but it does add a bit of overhead in the response time. PlanetScale actually uses semi-synchronous replication for our databases within a given region." Canonical PlanetScale datum — same framing as the 2022 consensus-Part-6 post's concepts/mysql-semi-sync-split-brain disclosure (semi-sync in production + the split-brain hazard it still allows).
Set rpl_semi_sync_master_timeout extremely high if you rely on semi-sync for consistency. Canonical PlanetScale operational tuning: "the primary server will wait 10 seconds for a replica with semi-sync mode enabled to acknowledge the transaction. This value can be modified, and if you rely on semi-sync for data consistency, you should increase this value to be high enough to guarantee consistency. We set the timeout value extremely high to ensure that the data for our databases are always consistent." Canonical wiki framing: the default 10-second timeout trades availability for consistency silently — after 10s, MySQL falls back to async, which defeats the semi-sync durability contract if the primary then crashes. For a system that treats consistency as inviolable, the only safe posture is to set the timeout to a value that dominates any realistic replica round-trip, effectively making the fallback-to-async path unreachable during normal operation.
Mixed replication modes let you get semi-sync durability on one replica and async fan-out on the rest. Morrison canonicalises a specific composite: "you could configure one replica with semi-sync and one without. This means when data is written to the source, it will always make sure that the one server with semi-sync enabled has received that transaction before responding, and the other replicas in the cluster will catch up when they can. In a disaster scenario (discussed further down this article), this can help you easily identify the best candidate to recover from." The load-bearing property: the semi-sync-flagged replica is the known-good failover candidate. You don't need to inspect replica state at failover time to find the furthest-ahead replica — by construction, the one-semi-sync-replica has at least every transaction the primary acknowledged, so it is always the safe failover target, regardless of network hiccups on the other async replicas. Canonical new pattern: patterns/mixed-sync-replication-topology.
Store binlogs on a separate disk from the database for throughput — and to save on cloud IOPS premium. Morrison canonicalises this as a previously-uncanonicalised operational hygiene rule: "By default, these logs are stored on the same disk as the database. As you can imagine, busy databases can be bogged down when you consider the amount of throughput being processed by a single disk (manipulating the database + reading binary logs for replication). That said, the better approach would be to store binary logs on a separate disk than the database. This approach can also save you some money in cloud environments where free volumes have hard IOPS limits." Canonical wiki framing: the binlog is a pure sequential-write workload (every committed transaction appended in order) that competes with the database's random read/write workload (buffer-pool spills, checkpoint flushes, index reads) for the same disk's IOPS budget. Separating them is the classic "don't mix sequential and random I/O" optimisation applied at the volume level — with a direct cloud-economics consequence under AWS EBS's per-volume IOPS caps (canonical via concepts/ebs-iops-burst-bucket + concepts/throughput-vs-iops). Canonical new concept: concepts/separate-binlog-disk.
Replication must be monitored or it will silently fail. "If left unmonitored, you'd have no idea whether or not your data is actually being replicated once it's configured." Morrison names two monitoring stacks: SolarWinds Database Performance (formerly VividCortex) and Prometheus — the latter as PlanetScale's own choice: "At PlanetScale, we use Prometheus to monitor replication, along with other metrics, for the clusters we manage." The canonical wiki framing is that replication has no intrinsic loud-failure signal — a replica that stops pulling binlog will quietly drift into unbounded lag, and writes will still succeed on the primary. Active monitoring of replication lag is the only way to notice before a failover exposes a drift-deep replica as the "failover candidate" that's actually hours behind. Already-canonical PlanetScale monitoring substrate is systems/planetscale-insights for query-tier observability + Vitess tablet throttler (systems/vitess-throttler) for replication-lag-driven admission control; this post names Prometheus as the fleet-metric tier underneath.
Unplanned-failover playbook: fence, promote, re-point app, re-point replicas — in that order. Morrison's four-step procedure verbatim: "1. Take measures to ensure the downed source won't come back online. This could cause replication issues if it happens unexpectedly. 2. Identify the replica you want to choose as the new source and unset the read_only option. If semi-sync is used, this would be the replica you've configured with the plugin along with the source. 3. Update your application to direct queries to the newly promoted source. 4. Update the other replicas to start replicating from the new source." Canonical wiki framing: the ordering is load-bearing — fencing first prevents split-brain (primary comes back online and starts accepting writes); promoting the semi-sync replica specifically (if the mixed-mode topology is in use) eliminates the "which replica is furthest ahead?" uncertainty. Canonical new concept: concepts/unplanned-failover-playbook. Complements the existing canonical planned-failover path via patterns/graceful-leader-demotion / Vitess PRS (from the 2022-04 consensus-Part-4 post): the two paths are duals on the revoke-and-establish axis — planned uses graceful demotion with query buffering, unplanned uses fencing with brief write unavailability during step 3.
Cross-region replication adds 60ms+ latency; cross-AZ is single-digit milliseconds. Morrison's numbers verbatim: "AWS claims that they have single-digit millisecond latency between availability zones in the same region. … At the time of this writing, cloudping.co reported that the latency between us-east-1 and us-west-1 is over 60ms." The canonical operational rule derived: "replicating across regions should be done in asynchronous mode so as to not cause unnecessary delay for the application making requests." Semi-sync's wait-for-ack latency must be amortised against this network round-trip; 60ms+ is unacceptable overhead for every transaction, so async is the only viable cross-region mode. Canonical new pattern: patterns/async-replication-for-cross-region. Complements the existing canonical concepts/regional-read-replica framing from the 2022-05 Portals post, which deals with the read-side of cross-region.

Systems¶

MySQL — the underlying RDBMS whose native replication primitives (binlog, GTIDs, semi-sync plugin, read_only, rpl_semi_sync_master_timeout) the post assumes. All eight configuration axes are MySQL-specific operational knobs.
PlanetScale — author's employer; PlanetScale's own production posture (active/passive, semi-sync with high timeout, Prometheus monitoring) is cited as the canonical configuration on three of the eight axes.
Prometheus — PlanetScale's chosen replication-monitoring stack: "At PlanetScale, we use Prometheus to monitor replication, along with other metrics, for the clusters we manage." Named alongside SolarWinds Database Performance (formerly VividCortex) as the two monitoring-system options for replication.

Concepts¶

Active/passive replication (new) — single-writer, many-reader topology; the PlanetScale-recommended posture for MySQL clusters.
Active/active replication (new) — multi-writer topology; warned against for MySQL due to absence of native conflict resolution.
GTID position — existing page, extended with Morrison's best-practices-altitude rationale (binlog-position is fragile across crash + restore; GTIDs are portable).
Asynchronous replication — existing page, extended with MySQL default + cross-region-only rationale.
MySQL semi-sync replication — existing page, extended with the extremely high timeout operational tuning + mixed-mode topology framing.
Binlog replication — existing page, extended with the separate-disk placement operational hygiene rule.
Replication lag — existing page, extended with the monitoring-is-mandatory framing and the two named monitoring stacks.
Separate binlog disk (new) — store binlogs on a separate volume from data; sequential-write workload separation; cloud-IOPS economics consequence.
Unplanned-failover playbook (new) — canonical four-step procedure for promoting a replica after an unplanned primary failure.

Patterns¶

Mixed sync + async replication topology (new) — one semi-sync-flagged replica as guaranteed failover candidate + async replicas for extra read capacity.
Async replication for cross-region, semi-sync within region (new) — latency-driven operational rule; canonical network-topology-dictates-replication-mode primitive.
Read replicas for read scaling — existing pattern, extended with the active/passive-framing Seen-in entry (Morrison: "the replicas can be used to serve up read-only queries, but all writes must be sent to the source. This helps split the load across all replicas").

Operational numbers¶

Cross-AZ latency: single-digit milliseconds (per AWS docs cited in-post).
Cross-region latency: us-east-1 ↔ us-west-1 over 60ms (per cloudping.co cited in-post).
MySQL rpl_semi_sync_master_timeout default: 10 seconds (post-cited default; PlanetScale sets "extremely high" above this).
No numbers disclosed for: PlanetScale's actual semi-sync timeout value; PlanetScale's replica count per cluster; binlog-disk IOPS savings; Prometheus scrape cadence for replication-lag metric; failover wall-clock; gtid_executed table size at scale.

Caveats¶

Pedagogy voice — no customer retrospective, no measured failover times, no quantified durability loss from the default 10-second timeout, no quantified cost savings from separate binlog disk, no quantified replication-lag distribution from PlanetScale's fleet.
PlanetScale's "extremely high" timeout is not quantified — the operational rule is given without a concrete value; operators copying this setting need to determine their own threshold.
Active/active MySQL is framed as universally wrong — the post does not engage with the mitigating strategies (last-write-wins conflict resolution, CRDT overlays, app-tier conflict-avoidance) that some teams do run in production (e.g. Galera, Group Replication). For PlanetScale's Vitess-backed active/passive + sharding architecture the rule is correct, but operators running MySQL Group Replication or Galera may interpret the rule too strongly.
Mixed-replication-mode caveats not disclosed — the one-semi-sync-replica architecture has a blast-radius concern the post elides: if that specific replica has a network partition or crashes, the primary falls back to async-for-everyone (or blocks until the timeout expires if timeout is extremely high). The post doesn't discuss the operational tension between "extremely high timeout" + "one semi-sync replica" — the two choices can jointly make the primary essentially write-unavailable if the single semi-sync replica is slow.
Binlog-disk placement mechanics elided — the post says "separate disk" but doesn't discuss log-bin configuration semantics, dedicated-filesystem vs dedicated-partition vs dedicated-volume trade-offs, or how to re-home binlogs on a running system without downtime.
Monitoring naming only, not depth — Prometheus and SolarWinds are named but the actual metrics that matter for replication (seconds-behind-master, GTID-executed delta, binlog-position delta, replica-IO-thread state, replica-SQL-thread state) are not enumerated. Complements systems/planetscale-insights + patterns/heartbeat-based-replication-lag-measurement from prior ingests but doesn't deepen them.
Unplanned-failover playbook omits load-balancer / application-layer orchestration — step 3 "Update your application to direct queries to the newly promoted source" skips over the hard parts: DNS TTL, connection-pool draining, rolling restart of app tier vs atomic config flip. Vitess's vtgate proxy tier + vtorc orchestrator (already canonical via systems/vtorc) automate this, but the post is MySQL-vanilla-altitude and doesn't cite the managed-substrate equivalents.
No discussion of GTID-based vs position-based replication migration path — GTIDs are strongly recommended but the post doesn't cover the operational procedure for enabling them on an existing cluster (rolling gtid_mode transition through OFF → OFF_PERMISSIVE → ON_PERMISSIVE → ON, ENFORCE_GTID_CONSISTENCY, etc.).
Publication-date snapshot: 2023-11-15 (pre–PlanetScale Metal, pre-Postgres, pre–Vitess 21). The recommendations are timeless on the MySQL axis but the cloud-IOPS argument for separate binlog disk is sharper in 2024-08-19 (Dicken IOPS-cost-cliff post) and the async-for-cross-region argument has been extended by the 2022-05 Portals regional-read-replica architecture.