Skip to content

PATTERN Cited by 1 source

Action log vs state log replication

The design-space split

Every database engine that supports downstream CDC + HA must answer two orthogonal design questions:

  1. Is the replication log an action log or a state log?
  2. Action log: each entry is a transaction identity + change payload, independent of the physical storage layout (MySQL binary log).
  3. State log: each entry is a physical-redo record that describes modifications to on-disk pages (Postgres WAL at wal_level=replica; wal_level=logical enriches it with enough column data for logical decoding but the substrate is still a physical redo log).

  4. Where does downstream-consumer-progress metadata live?

  5. Consumer-local (the consumer persists its own cursor; servers hold no per-consumer state — MySQL GTID).
  6. Primary-local catalog (the server tracks each consumer's position in a local catalog table — Postgres pg_replication_slots).
  7. Replicated to standbys (Postgres 17 failover slots partially move here, preserving an eligibility gate).

The canonical combinations in production:

Substrate Action vs state log Consumer progress HA-CDC coupling
MySQL binlog Action log Consumer-local (GTID) None
Postgres WAL (pre-17) State log Primary-local catalog (slot) Strong — slot is lost on failover
Postgres 17 failover slot State log Replicated with eligibility gate Medium — gated on subscriber advance

See concepts/ha-cdc-coupling for the operational consequences.

The two properties that decouple HA from CDC

When the log is an action log and consumer progress is consumer-local, three properties follow:

  1. Every replica is a valid CDC resume point. A GTID-aware consumer can point at any replica that has matching binlog retention and resume.
  2. HA actions don't touch the CDC contract. Promote a replica, point the consumer at the new primary (or any other replica), resume.
  3. CDC-subscriber behaviour is irrelevant to HA scheduling. The operator's HA actions proceed on the operator's schedule; a lagging or offline consumer can't block them.

The property is enabled by log_replica_updates=ON in MySQL — every replica re-emits applied transactions into its own binlog, preserving GTID continuity across the full cluster.

The two properties that couple HA to CDC

When the log is a state log and consumer progress lives in a primary-local catalog (Postgres logical-replication slot), the inverse holds:

  1. Only the primary can advance the slot. Consumer-progress metadata is primary-attached; on failover it's gone unless mirrored.
  2. Mirroring creates its own gate. Postgres 17 failover slots require the subscriber to have advanced the slot while the standby was following — to preserve exactly-once CDC semantics. The eligibility gate means a quiet subscriber blocks HA.
  3. Operational coupling. "Slot progress is a single-node concern that must be coordinated across the cluster at failover time, and eligibility depends on subscriber behavior outside your control."

When to pick which

  • Pick action-log + consumer-local when CDC consumers are independent of operator control (third-party batch CDC, external analytics pipelines, Debezium fleets), when HA action latency must not couple to consumer freshness, and when you can tolerate paying the log_replica_updates=ON disk + CPU overhead on every replica to maintain full- cluster re-emission.
  • Pick state-log + primary-catalog when CDC consumers are first-party + well-behaved (managed CDC platform internally operated), when exactly-once CDC semantics are load-bearing, and when failover is rare enough that the operational coupling is an acceptable trade-off for the simpler primary-side progress tracking.

Beyond databases

The same design-space split appears in log-based distributed-systems substrates:

  • Apache Kafka consumers persist offsets client-side (action-log + consumer-local) — any broker replica becomes a valid resume point.
  • Vitess VStream at the VTGate level is explicitly action-log shaped — the VGTID is consumer-local and carried across shards (see concepts/unified-change-stream-across-shards).
  • Systems that store consumer cursors server-side (broker offset commits to ZooKeeper in old Kafka, server-side pointers in some message queues) re-introduce HA coupling.

Seen in

  • sources/2026-04-21-planetscale-postgres-high-availability-with-cdccanonical wiki statement of the design-space split. Sam Lambert (PlanetScale CEO, 2025-09-12) frames the two databases side by side: MySQL's action-log-with-consumer- local-GTID vs Postgres's state-log-with-primary-local-slot (enriched to replicated-with-eligibility-gate in Postgres 17). Canonical closing framing: "the brittle edge in Postgres high availability with logical consumers: slot progress is a single-node concern that must be coordinated across the cluster at failover time, and eligibility depends on subscriber behavior outside your control."
Last updated · 470 distilled / 1,213 read