Skip to content

PATTERN Cited by 1 source

Lock-based leader election

Pattern

Resolve races between multiple electors by making the first elector that successfully acquires an exclusive lock the sole winner, blocking all later electors from making any change until the lock is released. A lock-based protocol has four serial steps:

  1. Acquire lock (race-winner picker).
  2. Revoke prior leader (ensure no stale writes can commit).
  3. Establish new leader (commit the new primary).
  4. Release lock (explicit on success, or auto-release via time-component on failure).

Lock-based is one of the two race-resolution families Sugu Sougoumarane names in Part 5; the other is lock-free ("newest elector wins").

Canonical framing

Sugu's taxonomic move in Part 5:

"There are two approaches to resolving races: either the first agent wins, or the last one wins. The determination of who is first or last can vary depending on the approach used… An approach that makes the first agent win essentially prevents later agents from succeeding. This is equivalent to obtaining a lock. We will therefore call this approach lock-based." (Source: sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-5-handling-races)

Why the lock is often implicit

Sugu's load-bearing observation: Raft is lock-based even though it doesn't name the lock. The term-number push to a majority IS a distributed lock — it's fused with revocation and establishment into a single majority-quorum round-trip:

"One may argue that a system like Raft does not obtain a lock even though it makes the first elector win. This is because the act of obtaining a lock is shadowed by other actions it takes. If you subtract out the other actions (revoke, establish, and propagate) in the code that performs an election, it will be evident that what is left is the act of obtaining a distributed lock."

The fusion is why Raft is perceived as not using a lock. Making the lock explicit (Vitess's etcd-backed design — see patterns/external-coordinator-for-leadership-lock) is the architectural unlock that lets you tune each of the four steps independently.

How lock acquisition works

The invariant: the set of nodes each elector reaches must intersect with those of the others. Any mechanism achieving this intersection is a valid lock primitive. Sugu's enumeration:

  1. Majority quorum over the replica set itself — Raft's implicit design. Side-effect of the revocation step; no separate lock infrastructure required.
  2. Simple-majority lock layered over a pluggable durability predicate — lock uses a majority; revoke/establish uses the application's chosen intersecting-quorum predicate. Composable with patterns/pluggable-durability-rules.
  3. External consensus system — Vitess uses etcd. "The decision to rely on another consensus system to implement our own may seem odd. But the difference is that Vitess itself can complete a massive number of requests per second. Its usage of etcd is only for changing leadership, which is in the order of once per day or week." Canonical instance: patterns/external-coordinator-for-leadership-lock.
  4. Human authorization"Humans could decide to manually authorize an elector to perform a leadership change, essentially giving it a 'lock'." Used for rare high-stakes transitions where auto-failover is riskier than a paged operator.

Forward progress: the time-component cost

Lock-based systems pay a structural cost: forward progress must be manufactured via a time component, because a crashed lock-holder would otherwise block the system forever.

"An elector may successfully obtain a lock and then crash or become partitioned out of the rest of the system. This will prevent all other electors from ever being able to repair this situation. To resolve this, lock-based systems must introduce a time component: any elector that obtains a lock must complete its task within a certain period of time, after which the lock is automatically released."

The time-component is tuned for a tension between: - Auto-release too fast → false progress moves (slow-but-alive electors get pre-empted unfairly). - Auto-release too slow → long outage window when the lock-holder genuinely crashes.

Sugu's rule-of-thumb: "Clock skews are typically in the milliseconds. In general, it is advisable to use 'many seconds' of granularity to sequence events." Safety margin of ~10³ covers commodity clock drift.

Four scale-level advantages (Part 8 enumeration)

The time-component that pays for forward progress also enables four operational advantages that together tip the scale-level verdict — see patterns/lock-based-over-lock-free-at-scale:

  1. Graceful leadership changes — stable leader identity + external coordinator = the current leader can be asked to step down cleanly. See patterns/graceful-leader-demotion.
  2. Easier node addition / removal — membership changes serialise through the lock.
  3. Direct-to-leader consistent reads via leader lease — the same time-component doubles as the lease; reads go local without quorum.
  4. Natural anti-flapping enforcement — the lock-holder is the natural place to rate-limit leadership changes. See concepts/anti-flapping.

Faster convergence in the common case

Sugu's observation:

"A lock-based approach generally converges faster than approaches that are lock-free. This is because the first node that attempts a leadership change is likely to have made the most progress towards completing the task. Under most circumstances, giving the first elector the chance to succeed will complete the leadership change with the least disruption."

First-attempter-usually-wins is a structural common-case optimisation — the elector that detects the failure first is also the one best positioned to complete the repair, so "first wins" aligns with "most-ready-to-succeed wins" more often than not.

Canonical production instances

  • Vitess + VTOrc + etcd — explicit lock via etcd; revoke/establish via Vitess-topology + MySQL semi-sync. "In Vitess, the current leader for each cluster is published through its topology, and a large number of workflows rely on this information to perform various tasks. Any operation that does not want the leader to change just has to obtain a lock before doing its work."
  • MySQL + Orchestrator — pre-Vitess generation of the same pattern; Orchestrator holds the lock and issues failover primitives.
  • Raft — lock-based in substance despite the implicit lock framing.
  • Chubby / ZooKeeper leader election — external-coordinator shape over a Paxos / Zab substrate; lock is the point of the service.

Composition

When to choose lock-free instead

See patterns/lock-free-leader-election: - Very small clusters where the time-component overhead exceeds the lock's value. - One-shot decisions (DNS updates, cert distribution) where no stable-leader state exists to be preserved. - Environments without a reliable clock or external coordinator.

For most large-scale production consensus systems (databases, control planes, storage), Sugu's verdict is lock-based — see patterns/lock-based-over-lock-free-at-scale.

Seen in

Last updated · 550 distilled / 1,221 read