PATTERN Cited by 1 source

Multi-endpoint quorum health check¶

Pattern¶

Deploy health-check probes at multiple endpoints (typically across availability zones), and only act on a failure signal when a majority (quorum) of endpoints agree the target is unreachable. This reduces the false-positive rate that a single probe suffers from when the network path between probe and target is itself flaky.

"Some monitoring solutions run health checks from multiple endpoints, and require a quorum, an agreement of the majority of check endpoints that there is indeed a problem. This kind of setup must be used with care; the placement of the endpoints in different availability zones is critical to achieve sensible quorum results. Once that's done, though, the triangulation is powerful and useful." (Source: sources/2026-04-21-planetscale-orchestrator-failure-detection-and-recovery-new-beginnings)

Why quorum¶

A single health-check endpoint has the same failure surface as any other single network path:

Transient packet drop → false positive.
Local network issue → false positive.
Endpoint host itself partitioned from target → false positive.

A majority-of-N agreement suppresses these single-path failures; the probability of N/2+1 endpoints all seeing a real target-side issue simultaneously is much higher than N/2+1 endpoints all suffering uncorrelated network issues at once.

The AZ-placement trap¶

The post flags a load-bearing caveat:

"The placement of the endpoints in different availability zones is critical to achieve sensible quorum results."

If all probes are in the same AZ as the target, a quorum of probes failing is exactly the same signal as a single probe failing — the correlated failure mode (the target's AZ going down, taking the probes with it) is not what you want to detect. Quorum only buys independence if the probes are in independent failure domains.

Common failure: probe mesh deployed in one AZ for operational convenience, target in another. When the probe AZ has an outage, all probes fail, quorum fires, unnecessary failover triggers.

Comparison with holistic-detection (Orchestrator's approach)¶

Multi-endpoint quorum requires deploying a probe mesh. Orchestrator's holistic detection reuses the replicas that already exist for other reasons — the cluster already has replicas pulling binlog, so the extra observation points come for free. Noach presents holistic detection as a different take on triangulation:

"Orchestrator uses a different take on triangulation. It recognizes that there are more players in the field: the replicas."

Trade-offs:

Dimension	Multi-endpoint quorum	Holistic detection (via replicas)
Probe infrastructure	Separate probe fleet	Reuses existing replicas
Placement concern	Cross-AZ critical	Replicas already distributed per HA design
Failure independence	Must be engineered	Inherited from replica placement
Retry requirement	Per-probe retry logic	Delegated to MySQL replication's built-in retries
Generalises to chained replication	No (single target)	Yes (each node's replicas observe it)

For MySQL-specific deployments, holistic detection dominates. For targets without an equivalent "already-connected observer" population (e.g. a standalone service), multi-endpoint quorum remains the go-to pattern.

When to use¶

Stateless services with no persistent-connection consumer population to observe liveness.
Multi-region DNS / CDN / load-balancer health checks where the probe mesh is already part of the routing layer.
Any service where the target-connectedness signal you care about is "can a fresh connection reach it" and not "is an existing connection still alive".

When not to use¶

MySQL clusters with replicas — prefer holistic detection.
Databases with stable connection-pooling layers (app-side) that already provide rich liveness signal.
Systems where AZ-independence for probes is hard to guarantee (single-datacenter deployments) — quorum gives no benefit over a single probe.

Seen in¶

sources/2026-04-21-planetscale-orchestrator-failure-detection-and-recovery-new-beginnings — Noach names multi-endpoint quorum as the common industry approach to failure detection and the reference point against which Orchestrator's holistic detection differentiates itself. Canonicalises the AZ-placement caveat and the framing that multi-endpoint quorum is "powerful and useful" when placement is correct but needs care.

concepts/holistic-failure-detection-via-replicas — the MySQL-specific alternative that reuses existing replica connections
concepts/primary-standby-failover — a common target action that multi-endpoint quorum gates
systems/orchestrator
patterns/replication-restart-as-liveness-probe — a companion pattern for the locked-primary scenario