Skip to content

CONCEPT Cited by 1 source

MySQL semi-sync replication

Definition

MySQL semi-synchronous replication (semi-sync) is a plugin mechanism layered on top of MySQL's native asynchronous replication that blocks a commit on the primary until a configurable number of replicas have persisted the transaction's binary-log event to their relay logs. It sits between fully-async replication (primary acks immediately, replicas catch up eventually) and fully-synchronous replication (primary waits for all replicas to apply) — hence "semi".

Shlomi Noach's canonical definition:

"Semi-synchronous replication is a mechanism where a commit on the primary does not apply the change onto internal table data and does not respond to the user, until the changelog is guaranteed to have been persisted (though not necessarily applied) on a preconfigured number of replicas." (Source: )

The contract — received, not applied

The critical phrase is persisted but not necessarily applied. When a semi-sync replica acks:

  • ✅ The event is in the replica's relay log (persistent storage — survives crash).
  • ❌ The event is not yet applied to the replica's InnoDB tables.

This is a deliberate optimisation. Waiting for apply would make commits gated on replica-side SQL-thread progress — often far slower than network+disk. Waiting for relay-log persistence is enough to satisfy the durability promise: if the primary crashes, the committed data exists somewhere else, recoverable.

The configuration surface

Two primary tunables:

Setting Role
rpl_semi_sync_master_wait_for_slave_count The number of semi-sync replicas that must ack before the primary commits. Must be ≥ 1 to enable semi-sync.
rpl_semi_sync_master_timeout Max time (ms) the primary waits for acks before falling back to async. The durability-critical setting — see concepts/semi-sync-timeout-fallback.

A replica participates in semi-sync iff it has the semi-sync plugin enabled. Non-semi-sync replicas also pull binlog, just without acking — and Noach's observation is that they can therefore be ahead of semi-sync replicas, which has operational implications on failover.

The promise the contract actually makes

"If the primary tells you a commit is successful, then the data is durable elsewhere."

That's it. It is not:

  • Not "the commit is visible on replicas" — that's apply, which happens later.
  • Not "a quorum of replicas agrees" — semi-sync is about any k replicas acking, not a structured quorum over the replica set.
  • Not "the commit is consistent with what other readers will see" — different observers reading different replicas may see different states during apply-lag windows.

The last point — semi-sync guarantees durability, not consistency — is the axis canonicalised at concepts/durability-vs-consistency-guarantee.

Why the log is sequential — and why k=1 is enough

"The changelog, the binary log, is sequential. A replica that acknowledges some changelog event, has necessarily received all of its prior events."

This is the property that lets rpl_semi_sync_master_wait_for_slave_count=1 work: a single ack from any replica implies that replica has every prior event. The primary doesn't need to coordinate which replica acks which event. The cost is that when you lose the primary and that one ack'ing replica together, you can't tell what the other replicas received — see concepts/minority-quorum-writeability.

Failure modes introduced by semi-sync

Semi-sync buys durability at the cost of new failure modes that pure-async doesn't have:

  1. Split-brain on crash-restart (Sugu Sougoumarane's framing): a restarted primary re-applies in-flight requests without re-verifying their acks — can contradict a newly-promoted primary.
  2. Split-brain on DC isolation (Noach's framing): primary + same-DC ack'ing replica can keep committing inside a partition that remote replicas don't see; on isolation recovery, the DC's writes contradict the remote-promoted primary's writes.
  3. Silent fallback to async on timeout: if acks don't arrive in time, the primary degrades to async replication and commits anyway — durability guarantee silently evaporates for the duration.
  4. Write-path latency coupled to worst-of-k replica acks: tail-latency on any k replicas is the primary's commit-latency floor. Cross-DC semi-sync replicas add RTT to every commit.

Semi-sync is not consensus

"People familiar with Paxos and Raft consensus protocols may find this baffling. However, reliable minority consensus is achievable, and Sugu Sougoumarane's Consensus Algorithms series of posts continues to describe this."

Semi-sync was designed as an optimisation layer on top of async replication, not as a from-scratch consensus protocol. It doesn't have proposal numbers, leader leases, quorum-read paths, or per-request versioning. The sysdesign-wiki's canonical treatment of what a real consensus layer on top of MySQL primitives looks like is the Consensus Algorithms at Scale series; see also patterns/pluggable-durability-rules as the architectural response.

Seen in

  • Canonical reframe of semi-sync as failover-enabler, not just durability primitive. Max Englander (PlanetScale, 2025-07-03) frames semi-sync at the architectural altitude underneath weekly failover drill: "MySQL semi-sync replication, Postgres synchronous commits. Commits stored durably on at least one replica before primary sends acknowledgment to the client. Enables us to treat replicas as potential primaries, and fail over to them immediately as needed." The load-bearing framing is the second clause: semi-sync isn't just "write survives primary loss" — it's "any replica can be promoted immediately without data-loss risk". This is the substrate that makes PlanetScale's weekly always-be-failing-over discipline safe. Without semi-sync, weekly-exercising failover would be weekly- risking data loss. Canonical composition: semi-sync (substrate) + query buffering (no client disruption) + Vitess Operator (no human in the loop) + pre-provisioned replica capacity (static stability) = weekly-failover-as-routine-ship-mechanism.
  • Canonical product-tier durability-basis statement for PlanetScale Metal. Richard Crowley (PlanetScale, 2025-03-11) canonicalises semi-sync as "the basis for any distributed system's durability claim" on Metal: "The replication that matters here is semi- synchronous, row-based, MySQL replication from a primary to two replicas distributed across three availability zones within a cloud region. Semi-synchronous replication ensures every write has reached stable storage in two availability zones before it's acknowledged to the client. Row-based replication integrates logically into transaction processing which allows readable replicas and backups." First wiki statement that semi-sync is the load-bearing architectural primitive underneath a shipped local-NVMe cluster-storage product — not a best-practice-for-some-workloads, but the entire durability story for the tier. Crowley's "two availability zones before it's acknowledged" framing interprets semi-sync's one-replica-ack requirement as a cross-AZ durability statement: the primary's AZ + the ack-ing replica's AZ = two AZs. Canonical pairing with tested- restore backups + automated replica replacement as the three-pillar durability envelope.
  • — Brian Morrison II (PlanetScale, 2023-11-15) canonicalises two load-bearing PlanetScale postures on semi-sync: (1) extremely high rpl_semi_sync_master_timeout to prevent the silent fallback-to-async on timeout expiry: "the primary server will wait 10 seconds for a replica with semi-sync mode enabled to acknowledge the transaction. This value can be modified, and if you rely on semi-sync for data consistency, you should increase this value to be high enough to guarantee consistency. We set the timeout value extremely high to ensure that the data for our databases are always consistent." The canonical wiki framing: the default 10-second timeout trades availability for consistency silently — after 10s, MySQL falls back to async, which defeats the semi-sync durability contract. Setting timeout to an effectively-unreachable value makes the fallback path unreachable during normal operation. (2) Mixed sync + async topology: exactly one replica flagged semi-sync + others async — canonicalised as patterns/mixed-sync-replication-topology. The load-bearing property: the semi-sync replica is the deterministic failover candidate because it has every committed transaction; async replicas are read capacity but not failover-safe. (3) Within-region scope: "PlanetScale actually uses semi-synchronous replication for our databases within a given region." Cross-region latency (60ms+ per cloudping.co) makes cross-region semi-sync infeasible — canonicalised as patterns/async-replication-for-cross-region.

  • — canonical wiki introduction of semi-sync as a mechanism: how the ack flow works, what the relay-log/binary-log roles are, the 1-n topology analysis, and the durability-vs-consistency distinction.

  • sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-6-completing-requests — Sugu Sougoumarane's commit-path perspective: semi-sync's apply-on-receive replica behaviour is Gap 1 of the crash-restart split-brain hazard; the generic two-phase completion protocol would close it.
  • Canonical shortest-form marketing statement of the semi-sync contract. Sam Lambert (PlanetScale CEO, 2023-06-28) names semi-sync as the tier-3 layer in PlanetScale's seven-layer data-safety envelope. Verbatim: "Semi-synchronous replication is a mode of replication in which the master waits until at least one replica acknowledges receipt of the transaction before moving on to the next one. This feature ensures that in case of a primary node failure, the replica that has received the transaction is up-to-date and can be promoted as the new primary node without data loss." Canonical wiki framing: semi-sync as a failover-preserving-durability primitive — the replica-ack guarantees that when the primary fails, at least one replica has the transaction and can be promoted losslessly. The without data loss clause is the load-bearing durability statement, aligning with Noach's "durability, not consistency" framing. Lambert's text is the shortest-form customer-facing statement of what semi-sync guarantees; Noach's deep-dive and Englander's extreme-fault-tolerance post supply the mechanism depth underneath.
Last updated · 542 distilled / 1,571 read