Skip to content

CONCEPT Cited by 1 source

HA / CDC coupling

Definition

HA / CDC coupling is an operational coupling between a database cluster's high-availability actions (primary promotion, replica replacement, planned failover) and the behaviour of downstream CDC consumers (connector polling cadence, online/offline state, slot advancement). A design has HA-CDC coupling when HA actions either cannot safely proceed without the CDC consumer having recently acted, or proceed only by deliberately breaking the CDC stream.

This is distinct from the expected coupling that a lagging CDC consumer pins WAL / binlog on the primary — that's a resource coupling, not an operational one. HA-CDC coupling means the operator cannot perform HA actions on their own schedule.

The canonical instance: Postgres logical replication

Postgres's logical replication slots couple HA to CDC through two mechanisms:

  1. The slot is a primary-local catalog object — when the primary fails, the slot is gone unless mirrored to the standby.
  2. With Postgres 17 failover slots, the slot state can be mirrored, but the eligibility gate requires the subscriber to have advanced the slot while the standby was following. A quiet CDC client (batch CDC polling every 6 hours, or an offline connector) leaves no standby eligible.

Sam Lambert canonicalises:

"Slot progress is a single-node concern that must be coordinated across the cluster at failover time, and eligibility depends on subscriber behavior outside your control." (Source: sources/2026-04-21-planetscale-postgres-high-availability-with-cdc)

Three operational symptoms:

  • Quiet-period failover: forced promotion breaks the CDC stream; subscribers must re-snapshot.
  • Replica replacement: new standbys remain ineligible until the next subscriber poll.
  • Stalled switchover: the operator waits for the subscriber or accepts slot drop.

Case 3 generalises beyond logical CDC: a dormant physical standby with a physical slot can pin restart_lsn indefinitely and fill the primary's WAL volume, tripping emergency failover or slot drop.

The canonical uncoupled instance: MySQL binlog + GTID

MySQL has no slot concept. A CDC consumer persists its own GTID position; replicas with log_replica_updates=ON re-emit every applied transaction into their own binlogs, preserving GTID continuity across the whole cluster. Any replica becomes a valid CDC resume point on promotion.

"On reconnect it tells any suitable server, 'resume from this GTID.' If the binlog containing that GTID still exists, streaming continues with no slot object and no eligibility gate. … A lagging consumer can't stall switchover; at worst, if binlogs are purged past the last GTID the connector processed, the connector must resnapshot but HA completes immediately." (Source: this post.)

MySQL's CDC availability is bounded by binlog retention, not by subscriber polling freshness. HA actions proceed independently.

The structural root of the coupling

The coupling arises when two pieces of state live in different substrates:

  1. The replication stream (WAL / binlog) — propagated uniformly to all replicas.
  2. The replication consumer-progress metadata (slot / GTID-position cursor) — lives either on the primary alone (Postgres pre-17, or post-17 with the eligibility gate) or in the consumer alone (MySQL GTID).

When (2) is primary-local, HA actions must preserve or recover it. When (2) is consumer-local, HA actions don't touch it.

See patterns/action-log-vs-state-log-replication for the generalised design-space view.

Seen in

  • sources/2026-04-21-planetscale-postgres-high-availability-with-cdccanonical wiki introduction of the concept. Sam Lambert (PlanetScale CEO, 2025-09-12) names the coupling explicitly: "this is the brittle edge in Postgres high availability with logical consumers: slot progress is a single-node concern that must be coordinated across the cluster at failover time, and eligibility depends on subscriber behavior outside your control." Walks three concrete symptoms (quiet-period failover, replica replacement, physical-slot WAL growth) and contrasts with MySQL's binlog-based CDC where no analogous coupling exists.
Last updated · 470 distilled / 1,213 read