PATTERN Cited by 1 source
Separate revoke from establish in leader election¶
Problem¶
Traditional majority-quorum consensus algorithms (Paxos, Raft) perform revocation of the previous leader and establishment of the new leader as a single atomic action — pushing a new proposal number to a majority of followers simultaneously invalidates the old leader (followers reject its stale number) and enables the new one. This conflation is elegant inside the protocol, but it hides the fact that the two concerns could be separated (Source: sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-4-establishment-and-revocation).
Once you are forced to accommodate practical scenarios — different failure-mode classes (planned software rollout vs. unplanned crash), different reachability assumptions (leader alive and cooperative vs. unreachable), different durability models — the single-action fusion becomes a straitjacket. You cannot optimise for the common case (graceful rollout) without re-paying the cost of the uncommon case (emergency fence) on every transition.
Sougoumarane's framing: "Traditional algorithms like Paxos and Raft try to do too many things at once. The cleverness of those approaches is commendable. However, such implementations are too rigid, and you cannot make modifications to specific parts of the algorithm without breaking something else. What we are going to do now is separate those concerns, and talk about how to address them individually."
Solution¶
Treat revocation and establishment as independent steps with independently-swappable mechanisms:
-
Define leadership as a set of conditions (proposal number + follower acceptance, or replication source + ack, or equivalent). Sougoumarane's definition: "Leadership is established when all the parameters are in place for a leader to successfully complete requests. Any change that invalidates this condition is a revocation."
-
Revocation is any mechanism that invalidates the leadership conditions for the old leader. Multiple valid mechanisms: proposal-number push, replication-source change, graceful step-down, follower fencing, physical isolation.
-
Establishment is any mechanism that satisfies the leadership conditions for the new leader. Separate from revocation; happens after revocation is complete.
-
Revocation must precede establishment — otherwise two nodes simultaneously satisfy the invariant and the at-most-one-leader property breaks.
-
Mechanisms can be selected per transition based on failure-mode class and reachability. Planned transitions (leader reachable) can use graceful step-down; emergency transitions (leader unreachable) must fence followers.
-
Mechanisms are interchangeable across rounds: "Once revocation is complete, both algorithms have to make conditions A and B true for the new leader, which will allow for subsequent rounds to use any method of revocation."
Canonical instance — Vitess PRS + ERS¶
Vitess exposes the separation as two user-facing shard-level operations:
-
PlannedReparentShard(PRS) — graceful demotion. Revocation mechanism: ask the current primary to step down. Establishment: promote the chosen replica. Composed with vttablet lameduck + vtgate query buffering so the application sees no errors. See patterns/graceful-leader-demotion. -
EmergencyReparentShard(ERS) — follower fencing. Used when the current primary is unreachable (crash / partition). Revocation mechanism: tell the surviving replicas to stop accepting writes from the old primary. Establishment: promote a replica. Application will see errors during the detection + fence window; the design goal is correctness (no split-brain) rather than zero-error UX.
The two operations exist as separate commands because the mechanism per step differs per failure-mode class, exactly the flexibility the pattern unlocks.
When to use¶
- You run a single-leader replicated system with distinct planned and unplanned failover needs.
- Planned transitions are frequent (daily software rollouts) and you can afford to invest in zero-error graceful mechanisms for the common case.
- Unplanned transitions happen but are rarer (monthly crashes / partitions) and can use a correctness-first mechanism.
- Your protocol doesn't strictly require majority-quorum atomicity for every transition — you can accept the added design complexity in exchange for per-axis optimisation.
When not to use¶
- Pure Paxos / Raft deployments where the protocol's atomic revocation + establishment is considered the correctness foundation and you have no practical need to vary mechanisms per failure class. The conflation is load-bearing for the protocol's proofs.
- Systems where any leadership transition is rare (monthly or slower) — the engineering investment in two paths doesn't pencil.
- Systems where consensus is the wrong primitive entirely (WAN-scale state distribution, eventual-consistency-tolerant workloads) — separating the steps solves a problem you don't have.
Trade-offs¶
- Adds design complexity — two code paths to maintain + a reasoned invariant to preserve across both. Test surface grows.
- Mechanism interchangeability is a freedom, not a mandate — most production systems pick one planned-path + one emergency-path and don't hot-swap revocation mechanisms per round. The architectural freedom is what matters.
- Revocation-before-establishment invariant must be enforced outside the per-mechanism logic — typically by the higher-level orchestrator (Vitess's reparent command, Kubernetes's leader-election library, etc.).
- Fence-the-followers emergency path still has a window between detection and fence completion during which the old leader might complete writes. ERS-class paths cannot be zero-error; they are correctness-first.
Seen in¶
- sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-4-establishment-and-revocation — canonical wiki introduction; Sougoumarane (Vitess co-creator) canonicalises the pattern; Vitess PRS + ERS are the worked instance.
Related¶
- patterns/graceful-leader-demotion — the specific planned-path shape composed on top of this pattern.
- patterns/zero-downtime-reparent-on-degradation — PlanetScale's production deployment of graceful reparent; this pattern is the algorithmic foundation.
- concepts/leader-revocation — the revoke step as a first-class concern.
- concepts/leader-establishment — the establish step as a first-class concern.
- concepts/lameduck-mode — drain primitive used during graceful revocation.
- concepts/no-distributed-consensus — the structural alternative for workloads where consensus itself is the wrong primitive.
- systems/vitess — canonical production instance.
- systems/mysql — replication primitives Vitess composes on.