Skip to content

CONCEPT Cited by 1 source

Minority-quorum writeability

Definition

Minority-quorum writeability is the structural property of a replication system in which a strict minority of the nodes can still satisfy the write-acknowledgement rule — meaning writes can proceed even when a majority of the cluster is absent. It is the specific structural feature that distinguishes MySQL semi-sync from a consensus protocol like Paxos or Raft, and it is the structural reason semi-sync admits split-brain even when the majority of sites remain online.

Noach's canonical observation

Shlomi Noach names the property directly in the semi-sync 1-n deployment:

"Of special interest is that in our '1-n' scenario, we have a quorum of two servers out of five or more. The primary, with a single additional replica, are able to form a quorum and to accept writes. That's how we got to have a split brain. While R2, R3, R4 form a majority of the servers, writes took place without their agreement." (Source: sources/2026-04-21-planetscale-mysql-semi-sync-replication-durability-consistency-and-split-brains)

Five-node cluster, rpl_semi_sync_master_wait_for_slave_count=1. Primary + one replica = 2 nodes = strict minority. Writes commit. If the primary's partition contains only those two nodes and the other three form a majority, both sides can make progress independently — the classic split-brain topology.

Why semi-sync is structured this way

The design decision is a performance trade-off. MySQL semi-sync is layered on top of asynchronous replication as an optimisation, not rebuilt as a consensus protocol. The ack rule is the minimum number of replicas the primary must hear from — tuned low (typically 1) to keep write latency small and tolerate replica-side lag without blocking. It was never designed to prevent concurrent leadership; that was assumed to be a failover-layer concern.

A majority-quorum rule would require rpl_semi_sync_master_wait_for_slave_count ≥ (N/2)+1, and every write would incur the slowest-majority-replica latency. For a 5-node cluster that's 3 acks per write; for a 7-node cluster, 4. The cost on the hot path is substantial, and it still doesn't prevent split-brain by itself — you'd also need per-write proposal numbers, leader leases, and quorum reads.

The structural split-brain story

With minority-quorum writeability, a partition that puts primary + any k replicas on one side keeps that side writable. If any failover mechanism on the other side (majority, but without ack-visibility into the primary's side) promotes a new primary, the two sides make independent progress:

  1. Partition separates {primary, R1} from {R2, R3, R4}.
  2. Old primary + R1 keep committing writes (satisfies wait_for_slave_count=1).
  3. Majority side detects primary-unreachable; promotes R2.
  4. New primary + R3/R4 also commit writes.
  5. Two independent write streams → divergent data.

Noach's point is that the majority of the cluster being up does not prevent this, because the write rule never consulted the majority.

Contrast with majority-quorum protocols

Under Paxos/Raft:

  • Writes require a majority ack — impossible from a minority partition.
  • Leader election requires a majority vote — a minority partition can't elect.
  • Split-brain is precluded at the protocol level (the minority partition simply cannot make progress).

The cost is exactly the per-write majority-round-trip latency that semi-sync avoided.

The reconciliation alternative: pluggable durability

The modern architectural response is pluggable durability rules (FlexPaxos-inspired): instead of a single wait_for_slave_count integer, express durability as a predicate over the replica set"at least one ack from ≥2 zones", "at least N acks with at least M from region X", etc. The plugin can be configured to make minority-quorum writeability topologically impossible while still expressing common deployment shapes. MySQL semi-sync does not offer this expressiveness.

Operational counter-measures

Operators running semi-sync at multi-DC scale mitigate minority-quorum writeability operationally:

  • Fence the old primary at the network / VIP layer before promoting a new one, so the minority-writeable side can't keep committing.
  • Accept the outage rather than promote when the primary partition might still be serving — wait for the partition to heal.
  • Force reparenting through an anti-flapping controller (Orchestrator / VTOrc) that imposes dwell-time between leadership changes so rapid partition cycles can't produce interleaved writes.

None of these are protocol-level guarantees; they are empirical engineering controls that work because operators have tuned them against observed partition shapes.

Seen in

Last updated · 378 distilled / 1,213 read