Skip to content

CONCEPT Cited by 2 sources

Durability as use-case dependent

Definition

Durability as use-case dependent is the architectural stance that the durability requirement of a consensus system — how many acknowledgments, from which nodes, must be collected before a write is considered durable — is a property of the workload and deployment topology, not a constant baked into the protocol. Different workloads and different topologies want different durability rules; the consensus protocol should accept a rule as input and fulfil it without opinions.

Sugu Sougoumarane's canonical wiki framing in Part 3 of the consensus-at-scale series:

"Durability is use-case dependent, we made it an abstract requirement requiring the consensus algorithms to assume nothing about the durability requirements." (Source: sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-3-use-cases)

Why this matters

Traditional Paxos and Raft hard-code majority quorum — a write is durable when (N/2)+1 nodes have acked it. This choice is simple, symmetric, and easy to reason about, but it is a single fixed rule applied to every deployment. When the deployment shape doesn't match (four zones with one node each, fifteen primaries across three DCs, two regions with two zones each), majority quorum expresses the durability requirement poorly or not at all.

The four production shapes Part 3 enumerates are all "uncomfortable for a majority based consensus system":

  1. Many-replica, fifteen-primary, three-DC — operator's durability model: "don't expect two nodes to fail at once; DC maintenance is planned."
  2. Four zones, one node each — operator's durability model: "any node can fail; a zone can go down without notice."
  3. Six nodes, three zones — operator's durability model: same as #2 but with more nodes per zone.
  4. Two regions, two zones per region — operator's durability model: "one zone at a time; region maintenance triggers proactive write transfer."

None of these fit "(N/2)+1 acks" naturally. The structural response is to move the durability rule from protocol-constant to configuration.

The observed upper bound: k = 2

Part 3's operational aside: "I have not seen anyone ask for a durability requirement of more than two nodes. But this may be due to difficulties dealing with corner cases that MySQL introduces due to its semi-sync behavior. On the other hand, these settings have served the users well so far. So, why become more conservative?"

Canonical wiki datum on the production upper bound for durability rules. k = 2 is empirically enough; k > 2 seems unnecessary in practice. Sugu offers two hypotheses: (a) it reflects MySQL semi-sync's corner-case-handling cost (a historical artefact), or (b) it reflects a genuine engineering equilibrium where larger k loses on tail latency without earning meaningful durability. Either way, the upper bound is empirical, not fundamental.

How "assumes nothing" is achieved

The mechanism that makes durability use-case-dependent without breaking safety is the intersecting-quorum generalisation:

  • The operator specifies a predicate P over node sets (e.g., "k nodes", "k nodes with at least one per zone", "at least one node in each of ≥2 zones").
  • The Request algorithm ack's the client when P is satisfied.
  • The Election algorithm reaches a node set that intersects every P-satisfying set on ≥ 1 node.

As long as these two algorithms share P, they can operate independently. "There is a way to reason about why this flexibility is possible. This is because the two cooperating algorithms (Request and Election) share a common view of the durability requirements, but can otherwise operate independently."

Worked example: 2-of-5

For N = 5 and k = 2:

  • Request: leader ack's after 2 acks.
  • Election: must reach 4 nodes (N − k + 1 = 4) to intersect every possible 2-node write set on ≥ 1 node.
  • Any time two nodes fail simultaneously, the system exceeds its failure-tolerance envelope and the operator faces an availability-vs-data-loss compromise.

Contrast with the majority-quorum special case (k = 3, election = 3): both write and read/election quorums are 3; symmetric, simpler to reason about, but with a higher write-path cost.

Why lower k is preferred in practice

Part 3's bias-toward-minimum disclosure: "This is the reason why we have a bias towards reducing the durability settings to the bare minimum. Expanding this number can adversely affect performance, especially the tail latency."

k > 1 ackers means the request path waits for the slowest of k replicas. At p99, the slowest-of-k effect amplifies tail latency significantly. For hundreds-of-requests-per-second workloads where tail latency is operationally load-bearing, minimising k is preferred.

The counterbalancing pressure is correlated failure risk: k = 1 is vulnerable to any single-node failure after ack but before propagation; k = 2 is only vulnerable to the rare sequence [leader accepts → leader sends → one recipient acks → leader returns success → leader + recipient both crash]. k = 2 is much rarer than k = 1 failures in practice.

Cross-boundary durability as predicate tightening

A subtle consequence of the use-case-dependent framing: you can improve the durability rule without raising k. Part 3's cross-boundary-durability construction:

"This type of failure can happen if the leader and the recipient node are network partitioned from the rest of the cluster. We can mitigate this failure by requiring the ackers to live across network boundaries. The likelihood of a replica node in one cell failing after an acknowledgment, and a master node failing in the other cell after returning success, is much lower."

Same k (= 2); different predicate ("2 nodes across network boundaries" instead of "any 2 nodes"). The rarity of cross-boundary correlated failure makes the latter meaningfully more durable without paying the tail-latency cost of k = 3.

This is the core power of use-case-dependent durability: the operator can encode operational knowledge about failure correlation into the durability predicate, something majority-quorum cannot express.

Structural consequences

  1. Deployment shapes decouple from protocol. Adding nodes in an already-covered zone doesn't change the protocol's cost profile (pluggable durability → topology elasticity).
  2. The operator owns availability-vs-data-loss trade-offs. Out-of-envelope failures can't be protocol-resolved — they require operator judgement. The protocol exposes the knob.
  3. YouTube's k = 1 + wide election reach is legitimised. Not an abuse of consensus — just a valid point in the design space.

Seen in

Last updated · 550 distilled / 1,221 read