Skip to content

CONCEPT Cited by 2 sources

Anti-flapping

Definition

Anti-flapping is the operational discipline of rate-limiting leadership changes in a consensus system: after a leadership change completes, a subsequent change is blocked for a minimum dwell-time window. Sugu Sougoumarane canonicalises it as both a stability feature and — serendipitously — a propagation-race mitigation:

"Most large-scale systems have anti-flapping rules that prevent a leadership from changing as soon as one was performed. This is because such an occurrence is usually due to a deeper underlying problem, and performing another leadership change will likely not fix it. And in most cases, it would aggravate the underlying problem." (Source: sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-7-propagating-requests)

Primary purpose: failure-loop avoidance

Rapid back-to-back leadership changes are almost always a symptom of a persistent failure mode that a new election won't fix. Sugu's canonical anecdote:

"In one of the systems that I knew of, the payload of the request was so big that it was causing the transmission to timeout. This resulted in a failure being detected and caused a leadership change. However, the new leader was also incapable of completing the request due to the same underlying problem. The problem was ultimately remedied by increasing the timeout."

Without anti-flapping, the system loops: leader times out → failover → new leader times out on the same request → failover → … Each failover adds coordination overhead and widens the window for split-brain races. Anti-flapping breaks the loop, forcing operators to diagnose the real problem.

Serendipitous second-order effect: propagation-race mitigation

Sugu's framing:

"Serendipitously, anti-flapping rules also mitigate the failure modes described above. Versioning of in-flight requests is less important for such systems."

The seven propagation failure modes from the preceding section all involve multiple electors racing in short succession — an elector that misses an incomplete request, then another elector that races with it, then a third elector that sees conflicting discoveries. Anti-flapping serialises leadership changes far enough apart that the in-flight elector has time to finish before a successor one starts. This means the per-request versioning rule (which Part 7 establishes as the formal fix) becomes operationally optional for systems that have strong anti-flapping.

Canonical production instance

  • MySQL + Orchestrator: Orchestrator's built-in anti-flapping is the reason large-scale MySQL deployments avoid split-brain despite MySQL binlog's faithful-GTID-propagation breaking the strict per-version-on-propagation rule. "The Orchestrator, which is the most popular leadership management system for MySQL, has built-in anti-flapping rules. These rules mitigate the above failure modes. This is the reason why organizations have been able to avoid split-brain scenarios while running MySQL at a massive scale."
  • Vitess + VTOrc: VTOrc is a customised fork of Orchestrator that inherits the same anti-flapping safeties.

Trade-off

Anti-flapping increases recovery time for genuine failures: a window of N seconds after the last failover blocks the next one even if the new leader has genuinely crashed. The window is therefore tuned as a max-detection-delay vs race-narrowness trade-off. Sugu does not give a numerical recommendation; production systems typically use windows on the order of tens of seconds to a few minutes.

Seen in

Last updated · 347 distilled / 1,201 read