CONCEPT Cited by 1 source
Slot-vs-offset position tracking¶
Definition¶
Slot-vs-offset position tracking is the structural problem that Postgres logical-replication CDC consumers (notably Debezium) track the same stream position in two independent locations that can legitimately disagree on startup, and the right reconciliation strategy is operator-specific rather than universal.
The two locations:
- Subscriber-side offset store — Debezium's stored offset.
Backing options include Kafka Connect offset topics
(external offset store),
in-memory (
MemoryOffsetBackingStore, ephemeral by design), or file-based. - Primary-side
replication slot — Postgres's
confirmed_flush_lsn(advances when the client acks) andrestart_lsn(oldest WAL still required).
Why they can legitimately disagree¶
On a clean synchronous run the two track in lockstep: connector acks event → offset advances → server marks slot advanced. They diverge when someone advances one side without the other:
| Cause | Slot LSN | Offset LSN | Legitimate? |
|---|---|---|---|
| pgjdbc keepalive flush | Ahead | Behind | Yes — driver flushed unmonitored WAL activity the connector doesn't care about |
pg_replication_slot_advance() by operator |
Ahead | Behind | Yes — operator recovering past corrupted WAL |
| Connector crashed after ack batched but before offset persisted | Behind / equal | Ahead | Edge case; depends on offset-store flush timing |
| Fresh connector + stale slot | Way ahead | Way behind | Real data loss — slot events permanently unseen |
| Slot dropped + recreated | Reset | Ahead | Real data loss — slot lost its history |
Three of the five rows above are legitimate; two are real data loss. The startup logic cannot distinguish them from LSN values alone.
Why no single default is correct¶
Debezium pre-3.4 had two behaviours:
- Default — stream from stored offset, fail only when Postgres rejects the LSN as unavailable in WAL (cryptic error).
- Strict (
internal.slot.seek.to.known.offset.on.start=true) — immediately fail when slot is ahead of offset, treating it as data loss.
Both were wrong for the pgjdbc keepalive flush case:
- Default fails cryptically when the legitimately-advanced slot's old WAL is already reclaimed.
- Strict fails on every startup after keepalive-flush activity, forcing full database re-snapshots.
The correct answer depends on operator-side invariants Debezium doesn't know:
- If the operator runs the
connector_and_drivermode and has slot-survives-failover discipline: the slot is authoritative; advance offset to match. - If the operator runs Kafka Connect offset topics as durable ground truth and treats the slot as a primary-local implementation detail: the offset store is authoritative; fail loudly when the slot is ahead.
The resolution: make it explicit¶
offset.mismatch.strategy
— Zalando's 2025-12 contribution to Debezium 3.4 — lets
operators pick per-deployment which side wins (trust_offset /
trust_slot / trust_greater_lsn / no_validation),
combined with lsn.flush.mode to
control who can advance the slot in the first place.
The two properties together express the operator's position-tracking posture as explicit configuration rather than framework-imposed policy.
Why this generalises¶
The pattern generalises beyond Postgres + Debezium to any two-location durable-state system with legitimate divergence causes:
- Kafka consumer group offsets vs external checkpoint —
Kafka's
auto.offset.resetis the same shape. - S3-backed log retention vs consumer checkpoint — object storage lifecycle can reclaim past a stale consumer.
- Binlog retention vs MySQL CDC external offset store — consumer can fall behind retention legitimately (long maintenance window) or illegitimately (broken consumer).
In every case, the question "offset disagreement means what?" is operator-specific. Framework defaults should pick the conservative answer; opt-ins should let operators with different invariants choose a different policy explicitly.
Contrast: single-location position tracking¶
Systems without this problem: - In-source checkpointing (Redpanda Connect Oracle CDC) — stores the consumer position in a source-DB table, bound to the same transaction as the data. - Postgres physical streaming replication — the slot is the only position; no separate subscriber-side offset.
These architectures avoid the mismatch problem by eliminating one of the two positions entirely. Debezium-on-Postgres keeps both because the offset store serves functions the slot cannot (e.g. ACK semantics for Kafka Connect sinks).
Seen in¶
- sources/2025-12-18-zalando-contributing-to-debezium-fixing-logical-replication-at-scale
— canonical wiki introduction. Zalando frames the
slot-vs-offset mismatch as the structural root cause of
why Debezium had to hard-disable the
pgjdbc
keepalive-flush feature Zalando had shipped in 2023 — it
legitimately advanced the slot past the stored offset in a
way that broke the operator contract for most Debezium
users. The 2025 contributions
(
lsn.flush.mode+offset.mismatch.strategy) canonicalise per-deployment choice over framework default.
Related¶
- concepts/postgres-logical-replication-slot — the primary-side position.
- concepts/external-offset-store — Kafka Connect offset topic shape (persistent offset authoritative).
- concepts/memory-offset-backing-store — ephemeral offset-store shape (slot authoritative by construction).
- concepts/lsn-flush-mode — controls who advances the slot.
- concepts/offset-mismatch-strategy — controls who wins on startup mismatch.
- concepts/logical-replication — the replication mode.
- concepts/keepalive-message-lsn-advancement — the canonical legitimate-divergence cause.
- systems/debezium — the framework where the pattern is explicit.
- patterns/authoritative-slot-over-authoritative-offset — the posture pattern.