Skip to content

PLANETSCALE 2025-09-12

Read original ↗

PlanetScale — Postgres High Availability with CDC

Summary

Sam Lambert (PlanetScale CEO, 2025-09-12, re-fetched 2026-04-21) argues that Postgres's logical replication design makes high-availability (HA) and CDC operationally coupled in a way that MySQL's design does not. The load-bearing mechanism is the logical replication slot: a durable, primary-local catalog object that pins WAL on the primary until the subscriber advances it, and whose post-failover eligibility on any standby depends on whether the subscriber has been observed advancing the slot while that standby was following. Postgres 17's failover slots mirror slot state into WAL so candidates can carry it, but eligibility waits on subscriber behaviour the operator doesn't control. The result: a lagging or offline CDC client blocks primary promotion, or forces an operator to promote anyway and break the CDC stream. MySQL's binlog-based CDC — an action log with GTID continuity propagated by every replica — doesn't have this coupling: any suitable server can resume the stream from the consumer's last-known GTID, and switchover completes as soon as a replica is promoted. The post is a substrate-level design critique, not a product launch.

Key takeaways

  • The logical replication slot is the load-bearing primary-local object that couples HA to CDC. "the logical replication slot is a durable, primary-local object that carries two pieces of state: the oldest WAL the slot requires (restart_lsn) and the most recent position the subscriber has confirmed (confirmed_flush_lsn). The presence of that slot pins WAL on the primary until the CDC client advances." WAL accumulation under a lagging consumer is expected; the brittle part is promoting a new primary when the slot can't move with the promotion. (Source: this post.)

  • Three conditions gate failover readiness for a logical slot on a standby. "The slot is synchronized on the standby, synced = true. The slot's position in the WAL is consistent with the position of the standby, not too far behind or too far ahead. The slot is persistent and not invalidated, temporary = false AND invalidation_reason IS NULL." The second condition is the hard one: standbys that "have never observed real slot progress" are kept ineligible by design to prevent "promoting a node that … would present an inconsistent stream to the subscriber." (Source: this post.)

  • Postgres 17's failover slots help, but the eligibility gate is intentional. "Postgres 17 introduced logical replication failover, so slot state can be synchronized to promotion candidates, but slot eligibility on the replica has caveats; A standby only becomes eligible to carry the slot after the subscriber has actually advanced the slot at least once while that standby is receiving the slot metadata." Serialising slot metadata into WAL solves the mirroring problem; it does not solve the eligibility problem, which is deliberate — "preserves exactly-once CDC semantics at the expense of HA flexibility." (Source: this post.)

  • Three explicit failure scenarios canonicalise the coupling. (1) "During a CDC quiet period, logical slots on standbys may remain in temporary status due to position inconsistencies. If forced failover occurs, the temporary slots are not failover- ready. The CDC stream breaks, requiring connector reinitialization and snapshot reload." (2) "Replacing replicas: You add new replicas (fresh pg_basebackup) and plan to retire the old ones. Each new standby begins synchronizing slot metadata from the primary but, by design, starts at a conservative point (older XID/LSN) and won't consider the slot synchronized until it has seen the subscriber advance. If the CDC client polls every 6 hours, all new replicas remain ineligible for promotion until that polling event occurs." (3) "Not just CDC. Any replication client backed by a slot can create a similar problem. A physical standby connected through a physical slot that stops pulling WAL will pin restart_lsn indefinitely. … it can fill the primary's WAL volume and trip the cluster into write unavailability, emergency failover, or drop the slot entirely if the maximum WAL size has been reached." Canonical framing: "the progress of the slowest slot determines how far the system can move without manual intervention." (Source: this post.)

  • Root cause: WAL is a physical redo log, slot progress lives in a primary-local catalog. "The WAL is a physical redo log for crash recovery and physical standby replication. The fact that a downstream consumer needs certain WAL retained is tracked in a primary-local catalog state inside pg_replication_slots. That state advancement only occurs when the consumer connects and acknowledges data. Historically, this state never rode along in WAL, so standbys had no authoritative copy." The architectural asymmetry: the replication stream and the replication-progress metadata live in different substrates. Postgres 17 bridges them via failover slots, but preserves the eligibility gate to preserve exactly-once semantics. (Source: this post.)

  • MySQL's binlog is an action log with per-transaction GTIDs propagated by every replica. "MySQL's binary log is an action log. Every transaction carries a GTID. Replicas with log_replica_updates=ON re-emit transactions they apply into their own binlogs, preserving GTID continuity. A CDC connector records the last committed GTID set. On reconnect it tells any suitable server, 'resume from this GTID.' If the binlog containing that GTID still exists, streaming continues with no slot object and no eligibility gate." See concepts/binlog-replication, concepts/gtid-position. (Source: this post.)

  • MySQL failover: promote + point + resume. "Promote a replica. Point the connector at any replica and it resumes from it's GTID position." Success is determined by binlog retention, not consumer-polling freshness. "A lagging consumer can't stall switchover; at worst, if binlogs are purged past the last GTID the connector processed, the connector must resnapshot but HA completes immediately. You can even recover binlogs from other sources and apply those." HA and CDC are decoupled because every replica carries the authoritative stream. (Source: this post.)

  • Side-by-side topology framing. Postgres: primary P + two sync standbys R1/R2 + CDC slot S on P; commits require ANY 1 flush by R1 or R2; CDC polls every 6 hours; adding R3 during maintenance → R3 (and recently-restarted R2) are not eligible to carry S until the CDC client advances it; switchover options are "wait for CDC to advance or promote anyway and accept slot drop." MySQL: primary M + two replicas MR1/MR2 with GTID + row-based binlog + log_replica_updates=ON; CDC persists GTID position; adding MR3 → it catches up and emits the same GTIDs; switchover proceeds immediately; connector resumes from any replica. Canonical closing framing: "slot progress is a single-node concern that must be coordinated across the cluster at failover time, and eligibility depends on subscriber behavior outside your control." (Source: this post.)

Systems / concepts / patterns surfaced

  • New concepts: concepts/postgres-logical-replication-slot — canonical wiki page for the Postgres pg_replication_slots catalog object, restart_lsn + confirmed_flush_lsn state pair, pgoutput decoder, subscriber-advance semantics, and WAL-pinning property. concepts/postgres-failover-slot — canonical wiki page for Postgres 17's failover slot mechanism: slot metadata serialised into WAL, three-condition eligibility gate (synced, position consistent, not temporary / not invalidated), subscriber-must-advance-while-standby-follows discipline. concepts/postgres-wal-level-logical — canonical wiki page for wal_level=logical + how logical decoding rides on the physical WAL substrate + the primary-local-catalog asymmetry. concepts/ha-cdc-coupling — canonical concept page for the operational coupling between HA actions and CDC behaviour that Postgres's slot-based logical replication creates; contrasts with action-log designs where every replica is a valid resume point.
  • New patterns: patterns/action-log-vs-state-log-replication — canonical design-space page contrasting action logs (every transaction propagated with an identity; any follower becomes a valid resume point for downstream consumers — MySQL binlog + GTID model) against state logs with primary-local progress catalogs (physical WAL + primary-local slot state — Postgres model). The shape generalises beyond databases: distributed log systems face the same choice on where "consumer progress" lives.
  • Extended: systems/postgresql (new Seen-in entry canonicalising the HA-CDC coupling + Postgres 17 failover slots + eligibility gate), systems/mysql (new Seen-in entry canonicalising the binlog-as-action-log + GTID + log_replica_updates=ON re-emission property as the CDC-HA decoupling mechanism), concepts/change-data-capture (new Seen-in entry canonicalising the substrate-level design-critique altitude + action-log-vs-state-log framing), concepts/binlog-replication (new Seen-in entry canonicalising the log_replica_updates=ON re-emission + GTID-continuity property as the decoupler of HA and CDC), concepts/gtid-position (new Seen-in entry canonicalising GTIDs as the substrate that makes any replica a valid CDC resume point), concepts/logical-replication (new Seen-in entry canonicalising Postgres-specific slot-eligibility + subscriber-behaviour coupling), companies/planetscale (new Recent-articles entry + systems/planetscale-for-postgres context).

Operational numbers

  • Synchronous standby count: 2 standbys in the sample topology with synchronous_standby_names = 'ANY 1 (r1, r2)' — commits wait for at least one standby flush.
  • CDC client polling interval in the worked scenarios: every few hours (6 hours in the explicit replica-replacement scenario).
  • Postgres version introducing failover slots: Postgres 17.
  • Three slot eligibility conditions on a standby (see key takeaway 2 above).
  • MySQL replica re-emission setting: log_replica_updates=ON (the rename of the pre-8.0.26 log_slave_updates option).

Caveats

  • PlanetScale-CEO voice, unsigned-architecture critique — Sam Lambert publishes under the PlanetScale blog; the framing is favourable to MySQL's model and, by extension, to PlanetScale's MySQL-first product line. The underlying mechanism claims are all independently verifiable against Postgres 17 documentation and the MySQL replication docs, but the "which design is better" framing is a product- positioning argument.
  • Postgres 17 failover slots are young in production — Postgres 17 shipped 2024-09-26; the post (2025-09-12) is written one year into real-world deployment. The post correctly describes the documented eligibility gate but doesn't survey production operators' experience with it.
  • No MySQL failure-mode discussion as counterweight — MySQL's binlog model has its own operational burdens (binlog retention horizon, log_replica_updates=ON disk-cost amplification, binlog-replication-breakage under GTID set corruption) that the post elides. See concepts/binlog-replication + concepts/separate-binlog-disk for the canonicalised MySQL-side operational surface, which is not trivially free either.
  • Physical slot scenario (case 3) is canonicalised but less developed — the WAL-volume-fills-and-cluster-goes-write- unavailable failure mode from a dormant physical standby is named but not walked through with a configuration example or mitigation recipe.
  • "Subscriber behavior outside your control" — the post's closing framing is load-bearing but elides that in practice many CDC subscribers (Debezium connectors, managed CDC platforms like the one in sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform) are operator-controlled infrastructure, not third-party services. The argument is strongest when the subscriber is genuinely external (batch CDC, downstream analytics platform); when the subscriber is first-party, the operator can make it advance on demand before failover.
  • No discussion of Postgres-native alternatives — the post doesn't engage with approaches like publication filtering, cascading replicas, or dedicated CDC primaries as partial mitigations. These are deferred.
  • No production numbers — no case studies, no measured failover-delay distributions, no percent of Postgres HA operators who run into the coupling in practice. The argument is structural, not empirical.
  • Implicit PlanetScale-for-Postgres framing — the post positions the problem to which PlanetScale's Postgres offering is the answer, without explicitly walking how PlanetScale's product mitigates the coupling (Postgres 17 failover slots + operational discipline? a different substrate underneath? not disclosed).

Source

Last updated · 470 distilled / 1,213 read