CONCEPT Cited by 2 sources
Anti-flapping¶
Definition¶
Anti-flapping is the operational discipline of rate-limiting leadership changes in a consensus system: after a leadership change completes, a subsequent change is blocked for a minimum dwell-time window. Sugu Sougoumarane canonicalises it as both a stability feature and — serendipitously — a propagation-race mitigation:
"Most large-scale systems have anti-flapping rules that prevent a leadership from changing as soon as one was performed. This is because such an occurrence is usually due to a deeper underlying problem, and performing another leadership change will likely not fix it. And in most cases, it would aggravate the underlying problem." (Source: sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-7-propagating-requests)
Primary purpose: failure-loop avoidance¶
Rapid back-to-back leadership changes are almost always a symptom of a persistent failure mode that a new election won't fix. Sugu's canonical anecdote:
"In one of the systems that I knew of, the payload of the request was so big that it was causing the transmission to timeout. This resulted in a failure being detected and caused a leadership change. However, the new leader was also incapable of completing the request due to the same underlying problem. The problem was ultimately remedied by increasing the timeout."
Without anti-flapping, the system loops: leader times out → failover → new leader times out on the same request → failover → … Each failover adds coordination overhead and widens the window for split-brain races. Anti-flapping breaks the loop, forcing operators to diagnose the real problem.
Serendipitous second-order effect: propagation-race mitigation¶
Sugu's framing:
"Serendipitously, anti-flapping rules also mitigate the failure modes described above. Versioning of in-flight requests is less important for such systems."
The seven propagation failure modes from the preceding section all involve multiple electors racing in short succession — an elector that misses an incomplete request, then another elector that races with it, then a third elector that sees conflicting discoveries. Anti-flapping serialises leadership changes far enough apart that the in-flight elector has time to finish before a successor one starts. This means the per-request versioning rule (which Part 7 establishes as the formal fix) becomes operationally optional for systems that have strong anti-flapping.
Canonical production instance¶
- MySQL + Orchestrator: Orchestrator's built-in anti-flapping is the reason large-scale MySQL deployments avoid split-brain despite MySQL binlog's faithful-GTID-propagation breaking the strict per-version-on-propagation rule. "The Orchestrator, which is the most popular leadership management system for MySQL, has built-in anti-flapping rules. These rules mitigate the above failure modes. This is the reason why organizations have been able to avoid split-brain scenarios while running MySQL at a massive scale."
- Vitess + VTOrc: VTOrc is a customised fork of Orchestrator that inherits the same anti-flapping safeties.
Trade-off¶
Anti-flapping increases recovery time for genuine failures: a window of N seconds after the last failover blocks the next one even if the new leader has genuinely crashed. The window is therefore tuned as a max-detection-delay vs race-narrowness trade-off. Sugu does not give a numerical recommendation; production systems typically use windows on the order of tens of seconds to a few minutes.
Seen in¶
- sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-8-closing-thoughts — Part 8 capstone canonicalises anti-flapping as one of the four lock-based advantages that win over lock-free at scale. "With lock-based approaches, you can: … (4) You can implement anti-flapping rules." Anti-flapping is structurally easier to enforce in lock-based systems because a stable leader + external coordinator provide a natural enforcement point; lock-free systems can implement it but have no natural enforcer, only per-elector clock-checks that are easy to violate under skew or partition. Vitess's anti-flapping is inherited from Orchestrator via the VTOrc fork — canonical production instance of the patterns/lock-based-over-lock-free-at-scale pattern.
- sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-7-propagating-requests — canonical wiki introduction of anti-flapping as both stability discipline and propagation-race mitigation; Orchestrator + VTOrc as worked production instances; MySQL large-scale deployment framing.