Skip to content

CONCEPT Cited by 1 source

Slot-vs-offset position tracking

Definition

Slot-vs-offset position tracking is the structural problem that Postgres logical-replication CDC consumers (notably Debezium) track the same stream position in two independent locations that can legitimately disagree on startup, and the right reconciliation strategy is operator-specific rather than universal.

The two locations:

  • Subscriber-side offset store — Debezium's stored offset. Backing options include Kafka Connect offset topics (external offset store), in-memory (MemoryOffsetBackingStore, ephemeral by design), or file-based.
  • Primary-side replication slot — Postgres's confirmed_flush_lsn (advances when the client acks) and restart_lsn (oldest WAL still required).

Why they can legitimately disagree

On a clean synchronous run the two track in lockstep: connector acks event → offset advances → server marks slot advanced. They diverge when someone advances one side without the other:

Cause Slot LSN Offset LSN Legitimate?
pgjdbc keepalive flush Ahead Behind Yes — driver flushed unmonitored WAL activity the connector doesn't care about
pg_replication_slot_advance() by operator Ahead Behind Yes — operator recovering past corrupted WAL
Connector crashed after ack batched but before offset persisted Behind / equal Ahead Edge case; depends on offset-store flush timing
Fresh connector + stale slot Way ahead Way behind Real data loss — slot events permanently unseen
Slot dropped + recreated Reset Ahead Real data loss — slot lost its history

Three of the five rows above are legitimate; two are real data loss. The startup logic cannot distinguish them from LSN values alone.

Why no single default is correct

Debezium pre-3.4 had two behaviours:

  • Default — stream from stored offset, fail only when Postgres rejects the LSN as unavailable in WAL (cryptic error).
  • Strict (internal.slot.seek.to.known.offset.on.start=true) — immediately fail when slot is ahead of offset, treating it as data loss.

Both were wrong for the pgjdbc keepalive flush case:

  • Default fails cryptically when the legitimately-advanced slot's old WAL is already reclaimed.
  • Strict fails on every startup after keepalive-flush activity, forcing full database re-snapshots.

The correct answer depends on operator-side invariants Debezium doesn't know:

  • If the operator runs the connector_and_driver mode and has slot-survives-failover discipline: the slot is authoritative; advance offset to match.
  • If the operator runs Kafka Connect offset topics as durable ground truth and treats the slot as a primary-local implementation detail: the offset store is authoritative; fail loudly when the slot is ahead.

The resolution: make it explicit

offset.mismatch.strategy — Zalando's 2025-12 contribution to Debezium 3.4 — lets operators pick per-deployment which side wins (trust_offset / trust_slot / trust_greater_lsn / no_validation), combined with lsn.flush.mode to control who can advance the slot in the first place.

The two properties together express the operator's position-tracking posture as explicit configuration rather than framework-imposed policy.

Why this generalises

The pattern generalises beyond Postgres + Debezium to any two-location durable-state system with legitimate divergence causes:

  • Kafka consumer group offsets vs external checkpoint — Kafka's auto.offset.reset is the same shape.
  • S3-backed log retention vs consumer checkpoint — object storage lifecycle can reclaim past a stale consumer.
  • Binlog retention vs MySQL CDC external offset store — consumer can fall behind retention legitimately (long maintenance window) or illegitimately (broken consumer).

In every case, the question "offset disagreement means what?" is operator-specific. Framework defaults should pick the conservative answer; opt-ins should let operators with different invariants choose a different policy explicitly.

Contrast: single-location position tracking

Systems without this problem: - In-source checkpointing (Redpanda Connect Oracle CDC) — stores the consumer position in a source-DB table, bound to the same transaction as the data. - Postgres physical streaming replication — the slot is the only position; no separate subscriber-side offset.

These architectures avoid the mismatch problem by eliminating one of the two positions entirely. Debezium-on-Postgres keeps both because the offset store serves functions the slot cannot (e.g. ACK semantics for Kafka Connect sinks).

Seen in

Last updated · 428 distilled / 1,221 read