PATTERN Cited by 1 source
Authoritative slot over authoritative offset¶
Intent¶
When a CDC consumer tracks stream position in two independent locations (a framework offset store plus a source-database-level cursor like a Postgres replication slot), and operator-side invariants make the source-database cursor the durable source of truth, adopt the slot-as-authoritative posture explicitly — advance the offset to match the slot on startup mismatch, rather than the other way around.
The alternative posture — offset-as-authoritative — is the industry default because it matches the typical framework-offset-topic (Kafka Connect) deployment. But for deployments where the primary-side slot is made durable across failover via discipline the framework can't see, the slot-authoritative posture is structurally correct.
Context¶
The pattern applies when:
- A CDC framework tracks position in both a subscriber-side offset store and a source-side cursor (the slot-vs-offset problem).
- The deployment operator maintains the source-side cursor as durable through primary failover and other operational events — typically via HA orchestration ( Patroni), slot-survives-failover discipline, or Postgres 17 failover slots.
- The offset store is either ephemeral by design
(
MemoryOffsetBackingStore) or the operator is willing to tolerate the offset store being behind the slot on startup (because the driver legitimately advances the slot past the offset).
Problem¶
The default CDC-framework startup logic is to trust the offset store — the framework-owned state the framework can reason about. Startup mismatch where the slot is ahead of the offset is interpreted as potential data loss, and the connector refuses to start.
This interpretation is correct for most Debezium deployments (Kafka Connect offset topics + Postgres without slot-failover discipline) — the slot-ahead-of-offset case there genuinely indicates a dropped-and-recreated slot, a recovery action, or a bug.
But for deployments with slot-durability discipline and
legitimate driver-level slot-advancement
(lsn.flush.mode=connector_and_driver),
slot-ahead-of-offset is the normal steady-state shape
whenever unmonitored WAL activity dominates — vacuum,
checkpoint, pg_switch_wal(). Refusing to start on every
restart is operationally untenable.
Shape¶
- Declare the posture explicitly as configuration, not
as a framework default. In Debezium 3.4, this is the
offset.mismatch.strategyproperty (values:trust_slot,trust_greater_lsn). - Require the operator to acknowledge the invariants — documentation must spell out the slot-durability discipline prerequisite so operators who don't have it don't adopt the posture.
- On startup mismatch where the slot is ahead, advance the offset to match — skipping replay of events the slot has marked as confirmed-flushed. The events between the two positions are treated as intentionally discarded rather than as data loss.
- On mismatch where the offset is ahead, either fail
(
trust_slot) or advance the slot viapg_replication_slot_advance()(trust_greater_lsn). The framework provides both choices because the operator posture can be one-directional (slot is truth; offset ahead is inconsistency) or bidirectional (max-LSN wins; self-heal). - Pair with driver-level LSN flush opt-in — slot-authoritative posture is meaningless if the driver doesn't advance the slot independently of connector acks. The two properties must be set together for the WAL-reclamation use case.
Examples on wiki¶
- Debezium 3.4
offset.mismatch.strategy(canonical, 2025-12) — Zalando contributes the posture as configurable per-deployment via DBZ-9688 / PR #6948. - Zalando's Fabric Event Streams platform since 2018 —
uses
MemoryOffsetBackingStoreto structurally preclude the offset-authoritative choice; the slot is authoritative by construction.
Consequences¶
Upside:
- Production-safe pairing with driver-level slot advancement (Zalando ran this shape for nearly two years processing billions of events with zero detected data loss).
- Enables operator-initiated recovery via
pg_replication_slot_advance()past corrupted WAL without full source-database re-snapshot. - The slot is already the authoritative source for the primary's WAL-reclamation logic, so making the connector match is aligned with how Postgres actually works.
Downside:
- Hard prerequisite on slot-survives-failover discipline. Without Patroni-class HA or Postgres 17 failover slots, adopting this posture → slot loss on failover → full source database re-snapshot.
- Discards events in the offset-behind-slot case. This is correct under the load-bearing assumption (the skipped events represent unmonitored WAL activity the connector doesn't care about), but the framework cannot verify the assumption.
- Documentation burden — operators must understand the invariant contract they're opting into.
When not to apply¶
Don't adopt this posture when any of these hold:
- The offset store is a Kafka Connect offset topic treated as durable truth.
- The deployment has no slot-survives-failover discipline (standalone Postgres without Patroni / failover slots).
- Slot recreation is an expected operational event (e.g. disaster recovery drills).
- The deployment has not validated
lsn.flush.modebehaviour at scale.
The conservative no_validation (pre-3.4 default) and
trust_offset (strict) shapes are correct for these.
Seen in¶
- sources/2025-12-18-zalando-contributing-to-debezium-fixing-logical-replication-at-scale
— canonical wiki introduction. Zalando canonicalises the
slot-authoritative posture as structurally grounded in
Patroni-managed slot-survives-failover
discipline since the mid-2010s, paired with
MemoryOffsetBackingStoresince 2018. The 2025-12 contribution makes the posture opt-in for deployments that want it viaoffset.mismatch.strategy=trust_slotortrust_greater_lsn, while keepingno_validationas the default to match the broader Debezium user population. The post frames the posture choice verbatim: "Users configuringlsn.flush.mode=connector_and_drivercan pair it withoffset.mismatch.strategy=trust_slotfor safe, production-ready operation with durable offset stores."
Related¶
- patterns/opt-in-driver-level-lsn-flush — companion pattern; slot-authoritative posture only makes the WAL-reclamation use case work when paired with driver-level slot advancement.
- patterns/client-driver-fix-over-application-workaround — upstream chain.
- concepts/slot-vs-offset-position-tracking — the structural problem this posture addresses.
- concepts/offset-mismatch-strategy — the property that makes the posture operator-configurable.
- concepts/lsn-flush-mode — the companion property.
- concepts/postgres-logical-replication-slot — the primary-side cursor treated as authoritative.
- concepts/memory-offset-backing-store — the offset store shape that structurally precludes any alternative.
- concepts/external-offset-store — the alternative (Kafka Connect offset topics) where offset-authoritative is the right posture instead.
- systems/debezium — the framework.
- systems/patroni — the HA enabler.
- systems/postgresql — the source database.
- systems/kafka-connect — the ecosystem where offset-authoritative is default.