SYSTEM Cited by 2 sources
Orchestrator¶
Orchestrator is the most popular open-source leadership-management system for MySQL, originally authored by Shlomi Noach at openark. It discovers replication topology, detects primary failures, and runs active failovers — playing the role of the elector in Sugu Sougoumarane's Part 7 Consensus algorithms at scale framing:
"The Orchestrator, which is the most popular leadership management system for MySQL, has built-in anti-flapping rules. These rules mitigate the above failure modes. This is the reason why organizations have been able to avoid split-brain scenarios while running MySQL at a massive scale." (Source: sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-7-propagating-requests)
Role in MySQL consensus¶
MySQL on its own doesn't implement a consensus algorithm — the replication substrate is leader-based but failover is out-of-band. Orchestrator provides:
- Topology discovery: continuously scans the replica set to build a current view of who replicates from whom.
- Failure detection: heartbeat + replication-lag-based signals for primary unavailability.
- Active failover: picks a new primary from the replica set, rewires replication topology, signals the application tier.
- Anti-flapping rules: rate-limits leadership changes so that rapid back-to-back failovers don't loop into split-brain territory.
The anti-flapping layer is what makes Orchestrator production-safe. MySQL's binlog faithfully propagates the original leader's GTID + timestamp to replicas (which formally violates Sugu's per-request-new-version rule — see patterns/version-per-request-to-resolve-conflicts). Anti-flapping compensates by making the race window empirically rare.
Why this matters for the consensus framework¶
Orchestrator is the canonical production instance of the patterns/external-metadata-for-conflict-resolution pattern: rather than re-stamp every transaction with a new version on propagation, it relies on MySQL's existing GTID + timestamp metadata and closes the formal correctness gap through anti-flapping rules. The combination is empirically sufficient for large-scale MySQL deployments to avoid split-brain — validated at organisations running MySQL "at a massive scale" (Sugu's framing).
Relationship to VTOrc¶
VTOrc is a customised fork of Orchestrator used by Vitess:
"In Vitess, we use VTorc, which is a customized version of the Orchestrator, and we inherit the same safeties. But we also intend to tighten some of these corner cases to minimize the need for humans to intervene if complex failures ever happen to occur."
VTOrc inherits Orchestrator's anti-flapping + topology-aware failover, and adds Vitess-specific integrations: etcd-backed leadership locks, VReplication workflow awareness, Vitess topology publication, per-shard elector instances.
Failure detection mechanism¶
Orchestrator's failure detection is holistic — it triangulates its own observation of the primary with the replicas' existing connection state. See concepts/holistic-failure-detection-via-replicas for the full model. Shlomi Noach:
"orchestrator asks: Am I failing to communicate with the primary? And, are all replicas failing to communicate with the primary? … orchestrator doesn't do check intervals and a number of tests. It needs a single observation to act. Behind the scenes, orchestrator relies on the replicas themselves to run retries in intervals; that's how MySQL replication works anyhow, and orchestrator utilizes that." (Source: )
When the triangulation yields an asymmetric signal (e.g. orchestrator-blind, replicas-see), Orchestrator fires emergency probes to accelerate resolution. One specific emergency probe — patterns/replication-restart-as-liveness-probe — kicks replication on all replicas to force TCP reconnect when the primary is suspected to be locked or at its connection limit.
Goal-oriented mode (Vitess integration)¶
Pre-integration, Orchestrator and Vitess collaborated via pre- and post-recovery hook scripts — a structurally fragile arrangement where a single dropped event could produce split / co-primary states. The Vitess-integrated fork (which evolved into VTOrc) introduced goal-oriented behaviour: Orchestrator reads MySQL metadata directly from the Vitess topology server via vttablet and converges the observed topology to Vitess's declared intent, not just to "what's visibly broken".
This unlocks new recovery modes: standalone-replica reconnect, writable-replica flip to read-only, read-only-primary flip to writable, multi-primary demotion to single primary, and — most load-bearing — graceful takeover when a functional cluster has the wrong server as primary (e.g. from a prematurely-terminated earlier failover).
Operations either fail or converge — partial topology states are not an accepted intermediate. (Source: )
Seen in¶
- sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-8-closing-thoughts — Part 8 capstone frames Orchestrator as the canonical lock-based-at-scale production instance for MySQL, and positions Vitess+VTOrc as "more powerful than other existing implementations" of the durability-plugin idea — an implicit comparative claim against upstream Orchestrator. Orchestrator provides three of the four lock-based advantages natively (graceful demotion, direct-to-leader read routing, anti-flapping); VTOrc adds pluggable durability (patterns/pluggable-durability-rules) as the fourth.
- sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-7-propagating-requests — canonical wiki introduction of Orchestrator as the external elector + anti-flapping layer that compensates for MySQL's faithful-GTID-propagation; the split-brain-avoidance-at-massive-scale framing; the VTOrc-inherits-from-Orchestrator lineage.
- — Shlomi Noach's 2020 canonical disclosure of Orchestrator's holistic failure detection (triangulation via replicas, single-observation-per-agent, MySQL-replication's own retries as the low-pass filter), the three-scenario emergency probe taxonomy, the replication-restart-as-liveness-probe manoeuvre for locked-primary / too-many-connections detection, and the goal-oriented transition in the Vitess-integrated fork (cluster-aware convergence to Vitess topology-server intent, fail-or-converge invariant, graceful takeover for wrong-primary scenarios). Historical post, canonical for Orchestrator's detection mechanism and the Vitess-integration rationale that led to VTOrc.
Related¶
- systems/mysql
- systems/vitess
- systems/vtorc
- systems/vttablet
- concepts/elector
- concepts/anti-flapping
- concepts/request-propagation
- concepts/split-brain
- concepts/holistic-failure-detection-via-replicas
- concepts/emergency-failure-probe
- concepts/goal-oriented-orchestrator
- concepts/vitess-topo-server
- patterns/external-metadata-for-conflict-resolution
- patterns/lock-based-over-lock-free-at-scale
- patterns/pluggable-durability-rules
- patterns/replication-restart-as-liveness-probe
- patterns/multi-endpoint-quorum-health-check