CONCEPT Cited by 1 source
Replication heartbeat¶
Definition¶
A replication heartbeat is a timestamp row periodically
written to a dedicated table on the replication primary;
the same row arrives on replicas via normal replication, and
the lag for each replica is computed as now() - heartbeat_ts
on that replica.
"The most reliable way to evaluate replication lag is by injecting timestamps on a dedicated table on the Primary server, then reading the replicated value on a replica, comparing it with the system time on said replica."
— Shlomi Noach, Source: sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2
Why this technique dominates¶
Heartbeats work correctly in every failure mode competing lag-measurement techniques break on:
- Replica working well — heartbeat timestamp stays close to now.
- Replica lagging — timestamp gap grows exactly to the lag magnitude.
- Replication stopped — timestamp gap grows unbounded, directly surfacing the outage.
- Replication broken / misconfigured — same as stopped from the lag-measurement angle.
Alternative techniques (e.g. SHOW REPLICA STATUS /
Seconds_Behind_Source) fail or report misleading zeros in
several of these cases.
Canonical tool: pt-heartbeat¶
pt-heartbeat
is the Percona-Toolkit daemon that performs heartbeat
injection + measurement. Deployment requirements:
- Writes happen on the primary only. Running
pt-heartbeatin write mode on a replica corrupts the measurement for every downstream. - Failover must move the writer. On primary promotion, the heartbeat writer must follow — automated in the failover orchestration.
The injection-interval trade-off¶
The interval between heartbeat writes is the lag-metric granularity. Write every 100 ms → you can measure sub- second lag. Write every 10 s → you can't see lag below 10 s.
Finer granularity costs more:
- Write rate on the primary — linear in 1/interval.
- Binlog volume. This is the dominant cost axis at scale: "The heartbeat events are persisted in the binary logs, which are then re-written on the replicas. For some users, the introduction of heartbeats causes a significant increase in binlog generation."
- Storage. "With more binlog events having to be persisted, more binary log files are generated per given period of time. These consume more disk space. It is not uncommon to see MySQL deployments where the total size of binary logs is larger than the actual data set."
- Backup and retention cost. Binlogs are typically retained + backed up for recovery / audit — the heartbeat tax compounds across the retention window.
Hibernation fits naturally¶
Because the cost is high, it makes sense to generate heartbeats only when lag measurement is needed — i.e. when a throttler is actively serving requests. Throttler hibernation extends to the heartbeat generator: during idle periods, stop or slow heartbeat injection; re-ignite on first client request.
The cost of re-ignition is a short window where heartbeats are stale and the throttler will conservatively reject — see patterns/idle-state-throttler-hibernation.
Seen in¶
- sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2
— canonical wiki introduction. Shlomi Noach frames the
technique as the de-facto lag-measurement primitive in the
MySQL world, names
pt-heartbeatas the canonical tool, and highlights binlog-size growth as the principal production cost.