Skip to content

CONCEPT Cited by 1 source

Heartbeat counter (gossip)

The heartbeat counter is the local liveness signal that a gossip node attaches to every state exchange. It is the data-plane half of gossip-based failure detection: stale counters = suspected dead nodes.

Two numbers, not one

Real gossip stacks almost never ship just a heartbeat counter. They ship a pair:

  • Generation clock — monotonically-increasing integer, bumped every time the process restarts. Unmesh Joshi's Generation Clock pattern is the canonical writeup.
  • Version number — monotonically-increasing integer within a generation, bumped every time state changes (or on every successful gossip exchange for pure liveness).

The (generation, version) pair is the partial-order key: peers compare lexicographically and keep the higher value. The generation field is what makes the protocol correct across restarts — a restarted node with version 1 doesn't get mistaken for a stale one because its generation is now higher.

Cassandra EndPointState example

From the post, the shape of a Cassandra heartbeat payload:

EndPointState: 10.0.1.42
HeartBeatState: generation: 1259904231, version: 761
ApplicationState: "average-load": 2.4, generation: 1659909691, version: 42
ApplicationState: "bootstrapping": pxLpassF9XD8Kymj, generation: 1259909615, version: 90

The generation is a Unix timestamp here — a common implementation choice, as it's monotonic on any non-time-travelling machine.

How the "stuck heartbeat = dead" logic works

From sources/2023-07-16-highscalability-gossip-protocol-explained:

"The node is labeled healthy when the heartbeat counter keeps incrementing. On the other hand, the node is considered to be unhealthy when the heartbeat counter has not changed for an extended period due to a network partition or node failure."

With a single observer this is unreliable (the observer itself may be partitioned). Production gossip stacks (Cassandra, Dynamo) require multiple independent peers to confirm the liveness judgment before declaring a node dead — usually a phi-accrual probabilistic detector, not a fixed timeout.

Incremental payload

A HeartBeatState plus a set of ApplicationState records each with their own (generation, version) lets the receiver apply incremental gossip — only fetching changed sub-keys — rather than transferring the full node state each round. The local in-memory version number lets a node send "only incremental updates" on each exchange (Source: sources/2023-07-16-highscalability-gossip-protocol-explained §Gossip Protocol Implementation).

Seen in

Last updated · 319 distilled / 1,201 read