Skip to content

CONCEPT Cited by 1 source

Dummy-write heartbeat kludge

Definition

The dummy-write heartbeat kludge is the industry workaround for runaway WAL growth on a Postgres primary feeding logical replication: a scheduled job periodically writes a dummy row into the low-traffic table being replicated, solely to force the replication slot's restart_lsn to advance and let Postgres reclaim older WAL.

It is a kludge in the technical sense: it works, but it addresses the symptom rather than the cause, and it introduces permanent operational overhead that lives forever alongside every CDC stream set up this way.

Why it's distinct from general replication heartbeats

The adjacent MySQL-world primitive replication heartbeats (e.g. pt-heartbeat) exist to measure lag. The dummy-write kludge exists to advance a replication slot. They share the "write rows periodically to the primary" implementation shape, but the load-bearing reason differs:

  • MySQL heartbeats: measurement technique; correctness property is timestamp arrives on replica.
  • Postgres dummy-write kludge: slot-advancement technique; correctness property is restart_lsn moves forward.

Who endorsed it before the pure fix

Zalando's 2023-11 post names two industry blog posts that documented the kludge before anyone upstreamed the pure fix:

Wolfman's post even alluded to the pure-solution shape — "the pure solution of advancing the cursor" — before retreating to the kludge. Zalando's 2023-11 post framed this as "abandoning the prospect in favour of the same kludge".

Zalando itself had used the kludge historically before upstreaming the pure fix to pgjdbc. Verbatim:

"This solution had been used in Zalando in the past but it's a kludge that doesn't address the real issue at the heart of the problem and mandates a constant extra workload overhead from now and forever more when setting up Postgres logical replication." (Source: sources/2023-11-08-zalando-patching-the-postgresql-jdbc-driver)

Operational costs

  • Every stream needs its own dummy-write cron. Setting up a new CDC stream is no longer a one-step "create slot"; operators must provision and maintain a heartbeat writer.
  • Write traffic tax. Dummy writes generate WAL like any other writes — the technique doesn't reduce WAL volume, only allows it to be reclaimed by keeping the slot in motion.
  • Schema pollution. Dummy writes typically go into a dedicated heartbeat table or column, adding schema surface that exists solely for this workaround.
  • Failure-mode opacity. If the heartbeat writer fails silently, runaway WAL growth returns — diagnosing "why didn't the cron run?" is a permanent operational burden.

Debezium's adjacent built-in heartbeat

Debezium's Postgres connector has configurable heartbeat intervals that perform driver-side writes to a heartbeat table — a built-in instance of this same kludge, formalised. The Datadog multi-tenant CDC platform post names heartbeat tables as one of the seven operator-side configurations required for a Debezium Postgres pipeline. See systems/debezium#Prerequisites on the source database.

When the kludge is still necessary

The pure-solution pgjdbc fix shipped in pgjdbc 42.7.0. The kludge is still necessary for:

  • Pipelines on older pgjdbc (pre-42.7.0).
  • Pipelines using non-JVM clients that haven't adopted equivalent KeepAlive-ack behaviour.
  • Debezium deployments that rely on Debezium's own heartbeat mechanism rather than driver-level LSN advancement — the configuration is ambient in many Debezium installations.

Seen in

  • sources/2023-11-08-zalando-patching-the-postgresql-jdbc-drivercanonical wiki introduction. Zalando's 2023 post canonicalises the kludge, names its two industry blog-post endorsements, admits Zalando's own historical use of it, and positions it as the motivating pain point for the pure-solution pgjdbc patch. Direct framing verbatim: "Hacky solutions abound. The most popular by far was creating scheduled jobs writing dummy data to the pink table in order to force it to advance."
Last updated · 501 distilled / 1,218 read