CONCEPT Cited by 3 sources
MySQL semi-sync replication¶
Definition¶
MySQL semi-synchronous replication (semi-sync) is a plugin mechanism layered on top of MySQL's native asynchronous replication that blocks a commit on the primary until a configurable number of replicas have persisted the transaction's binary-log event to their relay logs. It sits between fully-async replication (primary acks immediately, replicas catch up eventually) and fully-synchronous replication (primary waits for all replicas to apply) — hence "semi".
Shlomi Noach's canonical definition:
"Semi-synchronous replication is a mechanism where a commit on the primary does not apply the change onto internal table data and does not respond to the user, until the changelog is guaranteed to have been persisted (though not necessarily applied) on a preconfigured number of replicas." (Source: sources/2026-04-21-planetscale-mysql-semi-sync-replication-durability-consistency-and-split-brains)
The contract — received, not applied¶
The critical phrase is persisted but not necessarily applied. When a semi-sync replica acks:
- ✅ The event is in the replica's relay log (persistent storage — survives crash).
- ❌ The event is not yet applied to the replica's InnoDB tables.
This is a deliberate optimisation. Waiting for apply would make commits gated on replica-side SQL-thread progress — often far slower than network+disk. Waiting for relay-log persistence is enough to satisfy the durability promise: if the primary crashes, the committed data exists somewhere else, recoverable.
The configuration surface¶
Two primary tunables:
| Setting | Role |
|---|---|
rpl_semi_sync_master_wait_for_slave_count |
The number of semi-sync replicas that must ack before the primary commits. Must be ≥ 1 to enable semi-sync. |
rpl_semi_sync_master_timeout |
Max time (ms) the primary waits for acks before falling back to async. The durability-critical setting — see concepts/semi-sync-timeout-fallback. |
A replica participates in semi-sync iff it has the semi-sync plugin enabled. Non-semi-sync replicas also pull binlog, just without acking — and Noach's observation is that they can therefore be ahead of semi-sync replicas, which has operational implications on failover.
The promise the contract actually makes¶
"If the primary tells you a commit is successful, then the data is durable elsewhere."
That's it. It is not:
- Not "the commit is visible on replicas" — that's apply, which happens later.
- Not "a quorum of replicas agrees" — semi-sync is about any
kreplicas acking, not a structured quorum over the replica set. - Not "the commit is consistent with what other readers will see" — different observers reading different replicas may see different states during apply-lag windows.
The last point — semi-sync guarantees durability, not consistency — is the axis canonicalised at concepts/durability-vs-consistency-guarantee.
Why the log is sequential — and why k=1 is enough¶
"The changelog, the binary log, is sequential. A replica that acknowledges some changelog event, has necessarily received all of its prior events."
This is the property that lets rpl_semi_sync_master_wait_for_slave_count=1 work: a single ack from any replica implies that replica has every prior event. The primary doesn't need to coordinate which replica acks which event. The cost is that when you lose the primary and that one ack'ing replica together, you can't tell what the other replicas received — see concepts/minority-quorum-writeability.
Failure modes introduced by semi-sync¶
Semi-sync buys durability at the cost of new failure modes that pure-async doesn't have:
- Split-brain on crash-restart (Sugu Sougoumarane's framing): a restarted primary re-applies in-flight requests without re-verifying their acks — can contradict a newly-promoted primary.
- Split-brain on DC isolation (Noach's framing): primary + same-DC ack'ing replica can keep committing inside a partition that remote replicas don't see; on isolation recovery, the DC's writes contradict the remote-promoted primary's writes.
- Silent fallback to async on timeout: if acks don't arrive in time, the primary degrades to async replication and commits anyway — durability guarantee silently evaporates for the duration.
- Write-path latency coupled to worst-of-k replica acks: tail-latency on any k replicas is the primary's commit-latency floor. Cross-DC semi-sync replicas add RTT to every commit.
Semi-sync is not consensus¶
"People familiar with Paxos and Raft consensus protocols may find this baffling. However, reliable minority consensus is achievable, and Sugu Sougoumarane's Consensus Algorithms series of posts continues to describe this."
Semi-sync was designed as an optimisation layer on top of async replication, not as a from-scratch consensus protocol. It doesn't have proposal numbers, leader leases, quorum-read paths, or per-request versioning. The sysdesign-wiki's canonical treatment of what a real consensus layer on top of MySQL primitives looks like is the Consensus Algorithms at Scale series; see also patterns/pluggable-durability-rules as the architectural response.
Seen in¶
-
sources/2026-04-21-planetscale-mysql-replication-best-practices-and-considerations — Brian Morrison II (PlanetScale, 2023-11-15) canonicalises two load-bearing PlanetScale postures on semi-sync: (1) extremely high
rpl_semi_sync_master_timeoutto prevent the silent fallback-to-async on timeout expiry: "the primary server will wait 10 seconds for a replica with semi-sync mode enabled to acknowledge the transaction. This value can be modified, and if you rely on semi-sync for data consistency, you should increase this value to be high enough to guarantee consistency. We set the timeout value extremely high to ensure that the data for our databases are always consistent." The canonical wiki framing: the default 10-second timeout trades availability for consistency silently — after 10s, MySQL falls back to async, which defeats the semi-sync durability contract. Setting timeout to an effectively-unreachable value makes the fallback path unreachable during normal operation. (2) Mixed sync + async topology: exactly one replica flagged semi-sync + others async — canonicalised as patterns/mixed-sync-replication-topology. The load-bearing property: the semi-sync replica is the deterministic failover candidate because it has every committed transaction; async replicas are read capacity but not failover-safe. (3) Within-region scope: "PlanetScale actually uses semi-synchronous replication for our databases within a given region." Cross-region latency (60ms+ per cloudping.co) makes cross-region semi-sync infeasible — canonicalised as patterns/async-replication-for-cross-region. -
sources/2026-04-21-planetscale-mysql-semi-sync-replication-durability-consistency-and-split-brains — canonical wiki introduction of semi-sync as a mechanism: how the ack flow works, what the relay-log/binary-log roles are, the 1-n topology analysis, and the durability-vs-consistency distinction.
- sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-6-completing-requests — Sugu Sougoumarane's commit-path perspective: semi-sync's apply-on-receive replica behaviour is Gap 1 of the crash-restart split-brain hazard; the generic two-phase completion protocol would close it.
Related¶
- concepts/asynchronous-replication
- concepts/binlog-replication
- concepts/mysql-semi-sync-split-brain
- concepts/semi-sync-timeout-fallback
- concepts/durability-vs-consistency-guarantee
- concepts/minority-quorum-writeability
- concepts/split-brain
- concepts/no-distributed-consensus
- patterns/cross-dc-semi-sync-for-durability
- patterns/infinite-semi-sync-timeout
- patterns/pluggable-durability-rules
- systems/mysql
- systems/vitess