Skip to content

CONCEPT Cited by 1 source

Majority-consensus replication

Definition

Majority-consensus replication is the replication scheme in which a write becomes durable once a strict majority of replicas acknowledge it. The scheme tolerates up to ⌊(N−1)/2⌋ simultaneous replica failures without data loss — concrete examples: a 3-replica fleet tolerates 1 failure, a 5-replica fleet tolerates 2, a 7-replica fleet tolerates 3.

The load-bearing property is split-brain impossibility under correct implementation: because any two majorities must intersect in at least one replica (see intersecting quorums), two competing writes cannot both commit independently. At least one replica must serialise them.

The two canonical implementations are Paxos and Raft; many production systems (Spanner, etcd, ZooKeeper, Kafka with KRaft, CockroachDB, Aurora DSQL) use one of these or close variants.

Properties

  • Tolerates minority failure — the remaining majority can continue accepting writes. A 3-of-5 replica fleet keeps writing after 2 failures.
  • Loses availability on majority failure — a 3-replica fleet stops accepting writes as soon as 2 fail. This is a feature (preserves consistency), not a bug.
  • Latency is gated by the slowest replica in the majority — a request is not acknowledged until the N/2+1-th replica responds, so tail latency is a real concern in geographically-distributed fleets (sibling of the straggler problem in fan-out).
  • Reads can be strongly consistent via quorum reads (see concepts/quorum-read) or weakly consistent for latency-sensitive paths.
  • Membership changes are operationally tricky — adding or removing replicas must itself go through the consensus layer to preserve the invariant.

In the 2026-05-01 Meta E2EE-backup post

Meta names majority-consensus replication explicitly as the fault-tolerance mechanism underneath the HSM-based Backup Key Vault:

"The vault is deployed as a geographically distributed fleet across multiple datacenters, providing resilience through majority-consensus replication."

The property being bought in the HSM context: no single datacenter failure takes backups offline, and no single datacenter compromise allows the attacker to reconstruct or corrupt key material alone — majority-consensus enforces that key operations require cooperation from a majority of HSM fleet members across independent datacenters.

This is a different load profile from the OLTP Paxos/Raft context — HSM operations are relatively rare (per-user recovery-code-to-key derivation events, not continuous transaction streams), so latency is less pressured than in Spanner-class OLTP. What matters is failure tolerance + single-datacenter-compromise tolerance, both of which majority-consensus delivers naturally.

Why not single-primary replication

A single-primary replication scheme (one replica is authoritative, others are read-only followers) would be simpler — but it concentrates the "operator cannot read the keys" invariant on one HSM. If that single primary is compromised, the attacker has a one-stop shop. Majority-consensus forces the attacker to simultaneously compromise a majority of geographically independent HSMs — which is precisely the threat model the HSM deployment is designed against.

Seen in

Last updated · 445 distilled / 1,275 read