CONCEPT Cited by 1 source

MySQL semi-sync split-brain¶

Definition¶

A MySQL semi-sync split-brain is a specific production hazard in which a MySQL primary using semi-synchronous replication completes writes that were never durable — producing reads, to other observers, of data that was never truly committed across the replica set. The hazard comes from two distinct behavioural gaps in the MySQL semi-sync protocol that the generic two-phase completion protocol is designed to close.

Sugu Sougoumarane's canonical wiki framing: "The MySQL semi-sync protocol does not support this two-phase method of completing requests. When a replica receives a request, it immediately applies it. … A primary that is restarted after a crash completes all in-flight requests without verifying that they received the necessary acks. This could lead to 'split-brain' scenarios." (Source: sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-6-completing-requests).

The two gaps¶

Gap 1: replicas apply-on-receive¶

In a generic two-phase consensus system, followers hold received writes as tentative — the payload is persisted but the effect is not yet materialised. In MySQL semi-sync, the replica applies immediately upon receive. There is no tentative-and-waiting state; the row is already written.

The consequence: if the write never reaches the durability threshold (e.g. the primary crashes before the second replica ack'd), there is no cheap cancellation path. The data is already live on whichever replicas received it, visible to reads against those replicas.

Gap 2: restart completes unverified in-flight work¶

The worse of the two gaps. When a primary restarts after a crash, it completes all in-flight requests without verifying that they received the necessary acks. The primary treats its own write-ahead log as authoritative, applying everything it finds there, regardless of whether the corresponding replica acks ever arrived.

The split-brain scenario:

Primary writes request X to its log.
Primary sends X to replica R1 (received, applied) but crashes before sending to R2.
R1's partial-apply leaves X visible on R1.
Primary restarts, re-applies X on itself from its log (the semi-sync ack from R1 was not a precondition checked on restart).
Now X is on primary + R1 but not R2.
A failover promotes R2 (which never saw X), and X silently disappears from the cluster's observable state — but reads that hit the old-primary-plus-R1 branch did see X.

Different observers see different states depending on which replica they read from. Split-brain.

Why the two-phase protocol rules this out¶

Under the generic two-phase completion protocol:

Replicas would hold X as tentative, not applied.
On a primary crash, a new elector would enumerate tentative X across replicas and either propagate it (if durable) or cancel it (if not).
A primary that restarts finds tentative X in its own log and would need an explicit durability re-check before completing it — the gap-2 behaviour is exactly the step the protocol inserts.

The hazard is not a MySQL bug; it is a consequence of semi-sync's design origin as an optimisation-layer on top of asynchronous replication rather than a from-scratch consensus commit path. Vitess deployments on top of MySQL inherit the hazard and manage it operationally (carefully ordered reparenting via PRS/ERS, lameduck drain, query buffering at vtgate).

Mitigation in practice¶

Sugu points readers to his older post Distributed durability in MySQL for the corner-case catalogue. Operational mitigations used in practice:

Never auto-restart a crashed primary — force it through a reparent (e.g. Vitess's EmergencyReparentShard) so the old primary cannot re-apply unverified in-flight work.
Fence the old primary before re-introducing it — at the VIP / load-balancer / application layer, so any data it carries that did not propagate is observable only to humans during reconciliation, not to end users.
Keep the durability requirement tight — if a semi-sync timeout ever falls back to asynchronous replication, the hazard window widens.

Seen in¶

sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-6-completing-requests — Sugu canonicalises the hazard in the context of his consensus-algorithms series; the two-phase protocol is presented as the general shape this hazard violates.

concepts/split-brain — the general failure-mode class.
concepts/tentative-request — the marker whose absence produces gap 1.
concepts/durable-request — the implicit stage whose re-verification is skipped in gap 2.
concepts/two-phase-completion-protocol — the generic shape that would rule out the hazard.
systems/mysql — the host system of the hazard.
systems/vitess — the layer that manages the hazard operationally via PRS / ERS.