CONCEPT Cited by 1 source
Heartbeat counter (gossip)¶
The heartbeat counter is the local liveness signal that a gossip node attaches to every state exchange. It is the data-plane half of gossip-based failure detection: stale counters = suspected dead nodes.
Two numbers, not one¶
Real gossip stacks almost never ship just a heartbeat counter. They ship a pair:
- Generation clock — monotonically-increasing integer, bumped every time the process restarts. Unmesh Joshi's Generation Clock pattern is the canonical writeup.
- Version number — monotonically-increasing integer within a generation, bumped every time state changes (or on every successful gossip exchange for pure liveness).
The (generation, version) pair is the partial-order key: peers compare lexicographically and keep the higher value. The generation field is what makes the protocol correct across restarts — a restarted node with version 1 doesn't get mistaken for a stale one because its generation is now higher.
Cassandra EndPointState example¶
From the post, the shape of a Cassandra heartbeat payload:
EndPointState: 10.0.1.42
HeartBeatState: generation: 1259904231, version: 761
ApplicationState: "average-load": 2.4, generation: 1659909691, version: 42
ApplicationState: "bootstrapping": pxLpassF9XD8Kymj, generation: 1259909615, version: 90
The generation is a Unix timestamp here — a common implementation choice, as it's monotonic on any non-time-travelling machine.
How the "stuck heartbeat = dead" logic works¶
From sources/2023-07-16-highscalability-gossip-protocol-explained:
"The node is labeled healthy when the heartbeat counter keeps incrementing. On the other hand, the node is considered to be unhealthy when the heartbeat counter has not changed for an extended period due to a network partition or node failure."
With a single observer this is unreliable (the observer itself may be partitioned). Production gossip stacks (Cassandra, Dynamo) require multiple independent peers to confirm the liveness judgment before declaring a node dead — usually a phi-accrual probabilistic detector, not a fixed timeout.
Incremental payload¶
A HeartBeatState plus a set of ApplicationState records each with their own (generation, version) lets the receiver apply incremental gossip — only fetching changed sub-keys — rather than transferring the full node state each round. The local in-memory version number lets a node send "only incremental updates" on each exchange (Source: sources/2023-07-16-highscalability-gossip-protocol-explained §Gossip Protocol Implementation).
Seen in¶
- sources/2023-07-16-highscalability-gossip-protocol-explained — definitional source + Cassandra
EndPointStateexample. - sources/2025-10-22-flyio-corrosion — implicit in SWIM's ping/ack counter + cr-sqlite logical timestamp for state entries.
Related¶
- concepts/gossip-protocol
- concepts/peer-sampling-service
- concepts/last-write-wins — the conflict-resolution rule heartbeat-versioned state falls back on.
- systems/apache-cassandra
- systems/swim-protocol