CONCEPT Cited by 1 source

Slot-vs-offset position tracking¶

Definition¶

Slot-vs-offset position tracking is the structural problem that Postgres logical-replication CDC consumers (notably Debezium) track the same stream position in two independent locations that can legitimately disagree on startup, and the right reconciliation strategy is operator-specific rather than universal.

The two locations:

Subscriber-side offset store — Debezium's stored offset. Backing options include Kafka Connect offset topics (external offset store), in-memory (MemoryOffsetBackingStore, ephemeral by design), or file-based.
Primary-side replication slot — Postgres's confirmed_flush_lsn (advances when the client acks) and restart_lsn (oldest WAL still required).

Why they can legitimately disagree¶

On a clean synchronous run the two track in lockstep: connector acks event → offset advances → server marks slot advanced. They diverge when someone advances one side without the other:

Cause	Slot LSN	Offset LSN	Legitimate?
pgjdbc keepalive flush	Ahead	Behind	Yes — driver flushed unmonitored WAL activity the connector doesn't care about
`pg_replication_slot_advance()` by operator	Ahead	Behind	Yes — operator recovering past corrupted WAL
Connector crashed after ack batched but before offset persisted	Behind / equal	Ahead	Edge case; depends on offset-store flush timing
Fresh connector + stale slot	Way ahead	Way behind	Real data loss — slot events permanently unseen
Slot dropped + recreated	Reset	Ahead	Real data loss — slot lost its history

Three of the five rows above are legitimate; two are real data loss. The startup logic cannot distinguish them from LSN values alone.

Why no single default is correct¶

Debezium pre-3.4 had two behaviours:

Default — stream from stored offset, fail only when Postgres rejects the LSN as unavailable in WAL (cryptic error).
Strict (internal.slot.seek.to.known.offset.on.start=true) — immediately fail when slot is ahead of offset, treating it as data loss.

Both were wrong for the pgjdbc keepalive flush case:

Default fails cryptically when the legitimately-advanced slot's old WAL is already reclaimed.
Strict fails on every startup after keepalive-flush activity, forcing full database re-snapshots.

The correct answer depends on operator-side invariants Debezium doesn't know:

If the operator runs the connector_and_driver mode and has slot-survives-failover discipline: the slot is authoritative; advance offset to match.
If the operator runs Kafka Connect offset topics as durable ground truth and treats the slot as a primary-local implementation detail: the offset store is authoritative; fail loudly when the slot is ahead.

The resolution: make it explicit¶

offset.mismatch.strategy — Zalando's 2025-12 contribution to Debezium 3.4 — lets operators pick per-deployment which side wins (trust_offset / trust_slot / trust_greater_lsn / no_validation), combined with lsn.flush.mode to control who can advance the slot in the first place.

The two properties together express the operator's position-tracking posture as explicit configuration rather than framework-imposed policy.

Why this generalises¶

The pattern generalises beyond Postgres + Debezium to any two-location durable-state system with legitimate divergence causes:

Kafka consumer group offsets vs external checkpoint — Kafka's auto.offset.reset is the same shape.
S3-backed log retention vs consumer checkpoint — object storage lifecycle can reclaim past a stale consumer.
Binlog retention vs MySQL CDC external offset store — consumer can fall behind retention legitimately (long maintenance window) or illegitimately (broken consumer).

In every case, the question "offset disagreement means what?" is operator-specific. Framework defaults should pick the conservative answer; opt-ins should let operators with different invariants choose a different policy explicitly.

Contrast: single-location position tracking¶

Systems without this problem: - In-source checkpointing (Redpanda Connect Oracle CDC) — stores the consumer position in a source-DB table, bound to the same transaction as the data. - Postgres physical streaming replication — the slot is the only position; no separate subscriber-side offset.

These architectures avoid the mismatch problem by eliminating one of the two positions entirely. Debezium-on-Postgres keeps both because the offset store serves functions the slot cannot (e.g. ACK semantics for Kafka Connect sinks).

Seen in¶

sources/2025-12-18-zalando-contributing-to-debezium-fixing-logical-replication-at-scale — canonical wiki introduction. Zalando frames the slot-vs-offset mismatch as the structural root cause of why Debezium had to hard-disable the pgjdbc keepalive-flush feature Zalando had shipped in 2023 — it legitimately advanced the slot past the stored offset in a way that broke the operator contract for most Debezium users. The 2025 contributions (lsn.flush.mode + offset.mismatch.strategy) canonicalise per-deployment choice over framework default.

concepts/postgres-logical-replication-slot — the primary-side position.
concepts/external-offset-store — Kafka Connect offset topic shape (persistent offset authoritative).
concepts/memory-offset-backing-store — ephemeral offset-store shape (slot authoritative by construction).
concepts/lsn-flush-mode — controls who advances the slot.
concepts/offset-mismatch-strategy — controls who wins on startup mismatch.
concepts/logical-replication — the replication mode.
concepts/keepalive-message-lsn-advancement — the canonical legitimate-divergence cause.
systems/debezium — the framework where the pattern is explicit.
patterns/authoritative-slot-over-authoritative-offset — the posture pattern.