CONCEPT Cited by 2 sources
External offset store (CDC)¶
Definition¶
An external offset store is a CDC-consumer-side durable store that persists the position of the consumer in the source database's change log — binlog / oplog / LSN — outside the source database itself. The store is pluggable: Redis, a relational database, or any datastore with durability + atomic read-modify-write.
Canonical framing verbatim from the 2025-03-18 Redpanda post
on its mysql_cdc connector:
"MySQL CDC uses binlog positions to track changes, requiring an external cache (Redis, a SQL database, or another datastore) to store binlog offsets."
And on its mongodb_cdc connector:
"Uses external stores for oplog positions, similar to MySQL, giving you control over your checkpointing strategy."
Why it matters¶
Offset-durability storage is a structural CDC axis — who owns the consumer's progress state? Three canonical shapes across the CDC-capable engines:
| Engine | Offset ownership | Coupling |
|---|---|---|
| PostgreSQL | Server (replication slot + confirmed_flush_lsn) |
Tight — primary retains WAL until subscriber acks |
| MySQL | Consumer (external store) | Loose — source purges binlog on schedule |
| MongoDB | Consumer (external store, resume token) | Loose — oplog is capped |
| Spanner | Consumer (transactional row in source) | Tight — progress + data share same transaction |
| Oracle (Redpanda Connect) | Consumer (in-source checkpoint table — concepts/in-source-cdc-checkpointing) | Medium — progress inside source DB but non-transactional with data |
The external-offset-store column is the MySQL / MongoDB shape. The consumer is on the hook for three properties the store must provide:
- Durability — offset must survive consumer restart / process crash.
- Atomic read-modify-write — offset advance must be coordinated with downstream acknowledgement to preserve at-least-once delivery semantics.
- Availability — consumer can't make progress if the store is down, so the store's availability budget is part of the CDC pipeline's availability budget.
Trade-offs vs server-owned offset (Postgres slots)¶
External offset store advantages:
- No primary-side state pinning. The source's change log is purged on a schedule independent of consumer health; a hung consumer does not fill the source's disk (the failure mode that makes Postgres slots operationally heavy).
- Heterogeneous consumer topologies. Different consumers can use different stores with different durability / performance profiles.
- No privileged-role handshake needed. No
wal_level=logicalor replication-role grants.
External offset store disadvantages:
- Retention coupling. Consumer must catch up to the oldest source-side log position before it's purged; external store doesn't pin the change log, so retention is a source-operator decision the consumer must not exceed.
- Exactly-once harder. Offset is now in a different system from the data it tracks; two-system atomic advance requires either 2PC or idempotent writes downstream.
- Operational surface. Another system to run, monitor, and recover.
Example: Redis as external offset store¶
Redpanda Connect's MySQL CDC connector can use Redis as its
external offset store. Consumer checkpoints every N rows /
every batch by writing the latest applied binlog position to a
Redis key. On consumer restart, the connector reads the key and
resumes from that binlog position. Redis's durability is a
pipeline-reliability input — appendonly yes + appendfsync
always if the operator wants hard durability.
Seen in¶
- sources/2025-03-18-redpanda-3-powerful-connectors-for-real-time-change-data-capture — canonical wiki introduction. MySQL CDC + MongoDB CDC connectors in Redpanda Connect both require an external offset store; Redis, SQL databases, and any datastore with the right durability/atomicity profile are canonical backends.
Related¶
- concepts/change-data-capture
- concepts/binlog-replication — MySQL substrate.
- concepts/mongodb-change-streams — MongoDB substrate.
- concepts/postgres-logical-replication-slot — the server-owned-offset counter-pattern.
- systems/redpanda-connect — the canonical consumer.
- systems/debezium — ecosystem alternative; Debezium stores Kafka-Connect offsets in Kafka topics, a special case of the external-store shape.
- systems/redis — canonical backing store named in the Redpanda docs.
- patterns/snapshot-plus-catchup-replication — the lifecycle within which offset durability matters.