Skip to content

CONCEPT Cited by 4 sources

No distributed consensus

What it is

"No distributed consensus" is a deliberate design choice: build the state-distribution system without relying on a consensus protocol (Raft, Paxos, Multi-Paxos, Zab, viewstamped replication) as the coordination primitive. Instead, accept eventual consistency and resolve conflicts with CRDTs, last-write-wins, or domain-specific merge rules.

Why WAN-scale consensus is a trap

Consensus protocols are designed for correctness under partition, but their latency and throughput degrade sharply once round-trip times grow past a few tens of milliseconds. At global-WAN scale (São Paulo ↔ Virginia ↔ Sydney), a consensus protocol pays an RTT tax on every write, and leader-election latencies become user-visible.

Fly.io's canonical framing (from sources/2025-10-22-flyio-corrosion):

"Like an unattended turkey deep frying on the patio, truly global distributed consensus promises deliciousness while yielding only immolation. Consensus protocols like Raft break down over long distances."

Their prior attempt was HashiCorp Consul as a global routing store. Consul ran on "the biggest iron we could buy" and "wasted time guaranteeing consensus for updates that couldn't conflict in the first place."

When consensus is overkill

The orchestration model determines whether consensus is needed. Fly.io's model inverts the Kubernetes norm:

  • Kubernetes → central API server is the source of truth for workload placement ⇒ consensus-family KV store (etcd).
  • Fly.io → individual workers are the source of truth for their workloads ⇒ gossip + CRDT (Corrosion).

When each node owns a disjoint slice of the key-space, cross-node writes almost never conflict by construction. Consensus protocols spend cost optimising for conflicts that don't occur. Gossip + CRDT spend no cost on those conflicts and handle the rare actual conflict with a deterministic merge rule.

Canonical wiki instance — Fly.io Corrosion

Fly.io's Corrosion explicitly rejects the consensus-family:

"It doesn't rely on distributed consensus, like Consul, Zookeeper, Etcd, Raft, or rqlite (which we came very close to using)."

The mental model is OSPF: workers flood their own state; each consumer computes its own view; no single node has to agree with any other before making local decisions. Convergence in seconds across thousands of workers globally.

When to still use consensus

  • Linearizable writes required (bank transfers, unique IDs, idempotency keys at a granularity the CRDT can't handle).
  • Single-region deployment — consensus-family latencies are fine at LAN RTTs, and the CAP cost is lower.
  • Small, structurally-coordinated data — config, service discovery for a bounded fleet, leader elections. etcd / ZooKeeper are load-bearing exactly here.

The design heuristic: consensus inside a region, gossip across regions. Fly.io's regionalization project applies this split to Corrosion itself.

Seen in

  • sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-3-use-cases — Sugu's Use cases instalment sits on the yes-consensus end of the axis but frames four production deployment shapes ("all uncomfortable for a majority based consensus system") where the flexibility to vary the durability predicate matters more than the specific consensus protocol. The structural contrast with Fly.io Corrosion's no-consensus posture: Sugu's essay argues "within the yes-consensus camp, pluggable durability gets you most of the flexibility people reach for gossip+CRDT to obtain" — if the reason you were going to skip consensus is that majority-quorum doesn't fit your topology, pluggable durability under intersecting quorums may be the better answer. The no-consensus shape remains the right answer when the workload itself doesn't need linearizability (WAN-scale state distribution, non-conflicting partitioned writes).
  • sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-8-closing-thoughts — Sugu Sougoumarane's capstone to the Consensus algorithms at scale series sits at the yes-consensus end of the axis, but explicitly admits intellectual humility on the framework's uniqueness: "It is possible that consensus could be generalized using a different set of rules. But I personally find the approach presented in this series to be the easiest to reason about." The two architectural recommendations the series makes — patterns/pluggable-durability-rules (durability as a plugin over node-set predicates) and patterns/lock-based-over-lock-free-at-scale (four- advantage argument for lock-based leader election) — are within-the-yes-consensus-camp positions. The structural contrast with Fly.io Corrosion's no-consensus position remains the cleanest canonical wiki framing of the "how do you maintain a shared invariant across nodes?" design axis.
  • sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-4-establishment-and-revocation — the structural contrast: Sougoumarane's series operates at the yes-consensus end of the spectrum with explicit separation of revocation from establishment — see patterns/separate-revoke-from-establish. Canonical wiki cross-reference: some workloads need consensus but can exploit the revoke / establish separation for better UX (Vitess PRS / ERS); other workloads (WAN-scale state distribution) skip consensus entirely in favour of gossip
  • CRDT (Fly.io Corrosion). The two canonical wiki positions on the "how do you maintain a shared invariant across nodes?" axis.
  • sources/2025-10-22-flyio-corrosion — canonical primary source; the "face-seeking rake" framing is the wiki's headline statement of the WAN-consensus trap.
Last updated · 542 distilled / 1,571 read