Skip to content

SYSTEM Cited by 2 sources

SWIM protocol

SWIMScalable Weakly-consistent Infection-style Process Group Membership — is a gossip-based cluster-membership + failure-detection algorithm introduced by Das, Gupta, and Motivala (Cornell, 2002). Canonical paper (PDF).

SWIM is the specific gossip variant called out by sources/2023-07-16-highscalability-gossip-protocol-explained as the one Consul (via HashiCorp's Serf / memberlist libraries) uses in production, and also the one Fly.io's Corrosion is built on.

What SWIM adds over generic gossip

  1. Indirect ping for failure detection. A node that doesn't respond to a direct ping is not immediately suspected. Instead, the prober asks k random peers to each send a ping-req. If any of the indirect probes gets an ack, the target is live (it was the prober-to-target path that was broken). This collapses false-positive dead detection dramatically over pure heartbeat-based gossip.
  2. Gossip piggyback — SWIM carries membership updates piggybacked on the ping/ack/ping-req messages it would send anyway, so the failure-detection channel is the membership-propagation channel.
  3. Infection-style rumor-mongering — each update (join/leave/suspect/alive) is gossiped a fixed number of times and then retired. Keeps per-node bandwidth bounded.
  4. Suspicion mechanism — a state between alive and dead: a suspected node is gossiped as "suspect" for a timeout; if it or anyone else refutes the suspicion with a fresher alive message, the node returns to alive. Otherwise it's promoted to confirmed-dead. This handles the asymmetric-latency case where a node is slow to respond, not actually dead.

State machine

Each node's membership table tracks every other member in one of four states:

alive → suspect → confirmed-dead → (GC after T timeouts)
  ^       |
  |       v
  +---- refuted

Transitions propagate via piggybacked gossip. Version numbers on each transition ensure the alive ← suspect ← confirmed-dead ordering is monotone per incarnation.

Canonical implementations

  • Consul / Serf / memberlist (Go) — HashiCorp's SWIM library, used by Consul, Nomad, Vault, and many third-party systems. Probably the most widely-deployed SWIM implementation in production.
  • Fly.io Corrosion (Rust) — SWIM membership + QUIC update reconciliation + cr-sqlite CRDT. Canonical wiki primary source.
  • Weave Mesh — SWIM-style membership in the Weave Net overlay.
  • Cassandra's gossip is SWIM-like but predates and deviates from the canonical SWIM paper (closer to the Dynamo-style heartbeat-plus-reconciliation model).

Tuning parameters

  • Probe interval (T) — how often each node pings one random peer. Typical values 200 ms – 1 s.
  • Indirect-probe count (k) — number of peers to ask on indirect probe; typically 3.
  • Suspicion timeout — how long a node stays suspect before being declared dead; depends on cluster size (longer for larger clusters).
  • Gossip messages per round — how many piggybacked updates per ping.

Known trade-offs

  • Partitions are still invisible — if a sub-group is internally connected but severed from the rest, SWIM inside each sub-group stays happy. Inherited from gossip generally; see concepts/gossip-protocol.
  • Byzantine peers — SWIM is not byzantine-tolerant. A malicious peer can mark live nodes dead, or refuse indirect probes for others. Pair with authentication/authorisation for any untrusted-peer deployment.

Seen in

Last updated · 319 distilled / 1,201 read