CONCEPT Cited by 1 source
No distributed consensus¶
What it is¶
"No distributed consensus" is a deliberate design choice: build the state-distribution system without relying on a consensus protocol (Raft, Paxos, Multi-Paxos, Zab, viewstamped replication) as the coordination primitive. Instead, accept eventual consistency and resolve conflicts with CRDTs, last-write-wins, or domain-specific merge rules.
Why WAN-scale consensus is a trap¶
Consensus protocols are designed for correctness under partition, but their latency and throughput degrade sharply once round-trip times grow past a few tens of milliseconds. At global-WAN scale (São Paulo ↔ Virginia ↔ Sydney), a consensus protocol pays an RTT tax on every write, and leader-election latencies become user-visible.
Fly.io's canonical framing (from sources/2025-10-22-flyio-corrosion):
"Like an unattended turkey deep frying on the patio, truly global distributed consensus promises deliciousness while yielding only immolation. Consensus protocols like Raft break down over long distances."
Their prior attempt was HashiCorp Consul as a global routing store. Consul ran on "the biggest iron we could buy" and "wasted time guaranteeing consensus for updates that couldn't conflict in the first place."
When consensus is overkill¶
The orchestration model determines whether consensus is needed. Fly.io's model inverts the Kubernetes norm:
- Kubernetes → central API server is the source of truth for workload placement ⇒ consensus-family KV store (etcd).
- Fly.io → individual workers are the source of truth for their workloads ⇒ gossip + CRDT (Corrosion).
When each node owns a disjoint slice of the key-space, cross-node writes almost never conflict by construction. Consensus protocols spend cost optimising for conflicts that don't occur. Gossip + CRDT spend no cost on those conflicts and handle the rare actual conflict with a deterministic merge rule.
Canonical wiki instance — Fly.io Corrosion¶
Fly.io's Corrosion explicitly rejects the consensus-family:
"It doesn't rely on distributed consensus, like Consul, Zookeeper, Etcd, Raft, or rqlite (which we came very close to using)."
The mental model is OSPF: workers flood their own state; each consumer computes its own view; no single node has to agree with any other before making local decisions. Convergence in seconds across thousands of workers globally.
When to still use consensus¶
- Linearizable writes required (bank transfers, unique IDs, idempotency keys at a granularity the CRDT can't handle).
- Single-region deployment — consensus-family latencies are fine at LAN RTTs, and the CAP cost is lower.
- Small, structurally-coordinated data — config, service discovery for a bounded fleet, leader elections. etcd / ZooKeeper are load-bearing exactly here.
The design heuristic: consensus inside a region, gossip across regions. Fly.io's regionalization project applies this split to Corrosion itself.
Seen in¶
- sources/2025-10-22-flyio-corrosion — canonical primary source; the "face-seeking rake" framing is the wiki's headline statement of the WAN-consensus trap.
Related¶
- concepts/gossip-protocol — the mechanism that replaces consensus.
- concepts/crdt — the conflict-resolution primitive that replaces linearizability.
- concepts/link-state-routing-protocol — the mental model.
- systems/corrosion-swim — canonical production instance.
- systems/consul — the rejected alternative Fly.io explicitly moved off of.