Skip to content

CONCEPT Cited by 2 sources

External offset store (CDC)

Definition

An external offset store is a CDC-consumer-side durable store that persists the position of the consumer in the source database's change log — binlog / oplog / LSN — outside the source database itself. The store is pluggable: Redis, a relational database, or any datastore with durability + atomic read-modify-write.

Canonical framing verbatim from the 2025-03-18 Redpanda post on its mysql_cdc connector:

"MySQL CDC uses binlog positions to track changes, requiring an external cache (Redis, a SQL database, or another datastore) to store binlog offsets."

And on its mongodb_cdc connector:

"Uses external stores for oplog positions, similar to MySQL, giving you control over your checkpointing strategy."

Why it matters

Offset-durability storage is a structural CDC axis — who owns the consumer's progress state? Three canonical shapes across the CDC-capable engines:

Engine Offset ownership Coupling
PostgreSQL Server (replication slot + confirmed_flush_lsn) Tight — primary retains WAL until subscriber acks
MySQL Consumer (external store) Loose — source purges binlog on schedule
MongoDB Consumer (external store, resume token) Loose — oplog is capped
Spanner Consumer (transactional row in source) Tight — progress + data share same transaction
Oracle (Redpanda Connect) Consumer (in-source checkpoint table — concepts/in-source-cdc-checkpointing) Medium — progress inside source DB but non-transactional with data

The external-offset-store column is the MySQL / MongoDB shape. The consumer is on the hook for three properties the store must provide:

  • Durability — offset must survive consumer restart / process crash.
  • Atomic read-modify-write — offset advance must be coordinated with downstream acknowledgement to preserve at-least-once delivery semantics.
  • Availability — consumer can't make progress if the store is down, so the store's availability budget is part of the CDC pipeline's availability budget.

Trade-offs vs server-owned offset (Postgres slots)

External offset store advantages:

  • No primary-side state pinning. The source's change log is purged on a schedule independent of consumer health; a hung consumer does not fill the source's disk (the failure mode that makes Postgres slots operationally heavy).
  • Heterogeneous consumer topologies. Different consumers can use different stores with different durability / performance profiles.
  • No privileged-role handshake needed. No wal_level=logical or replication-role grants.

External offset store disadvantages:

  • Retention coupling. Consumer must catch up to the oldest source-side log position before it's purged; external store doesn't pin the change log, so retention is a source-operator decision the consumer must not exceed.
  • Exactly-once harder. Offset is now in a different system from the data it tracks; two-system atomic advance requires either 2PC or idempotent writes downstream.
  • Operational surface. Another system to run, monitor, and recover.

Example: Redis as external offset store

Redpanda Connect's MySQL CDC connector can use Redis as its external offset store. Consumer checkpoints every N rows / every batch by writing the latest applied binlog position to a Redis key. On consumer restart, the connector reads the key and resumes from that binlog position. Redis's durability is a pipeline-reliability input — appendonly yes + appendfsync always if the operator wants hard durability.

Seen in

Last updated · 470 distilled / 1,213 read