Skip to content

CONCEPT Cited by 1 source

MongoDB change streams

Definition

MongoDB change streams are MongoDB's native CDC mechanism exposing a cursor over the replica set's operations log (oplog) — an append-only collection of all data-mutating operations committed across the primary. Clients open a change stream on a database, a collection, or the whole cluster, and receive change events (insert / update / delete with configurable document payload) in commit order.

Change streams were introduced in MongoDB 3.6 and are the official consumer-facing API on top of the oplog; consumers should not tail the oplog directly.

Why it matters for CDC

Change streams are the substrate under every MongoDB CDC connector in the wiki: Debezium's MongoDB connector and Redpanda Connect's mongodb_cdc both consume this API. The 2025-03 Redpanda post canonicalises the operational shape verbatim:

"Captures updates directly from MongoDB's operations log, providing an efficient, near-real-time data stream."

Plus two structurally-distinctive properties:

  • Parallelised initial snapshots. "The connector employs parallel reads during snapshots, significantly boosting performance for large-scale data migrations by splitting collections into manageable chunks." Canonicalised as parallel snapshot.
  • Flexible document modes. "Customizable document handling for updates and deletes, supporting full-document lookups and pre/post image capture." MongoDB change streams can emit just the change delta, the full post-update document, or pre+post images — the consumer chooses.

Offset-durability shape

MongoDB change streams identify events by resume tokens (opaque values keyed to an oplog timestamp + event identity). Unlike Postgres's server-owned replication slot, MongoDB does not persist consumer progress server-side — the consumer must persist its own resume token. Redpanda Connect's MongoDB CDC connector therefore requires an external offset store: "Uses external stores for oplog positions, similar to MySQL, giving you control over your checkpointing strategy."

Oplog retention coupling

The oplog is a capped collection — finite on disk, with a retention window driven by write rate. Any change-stream consumer whose resume token falls behind the oplog's retention horizon can no longer catch up from the stream and must restart from a fresh snapshot. This is the same failure mode as MySQL's finite binlog retention, and the structural reason snapshot-plus-catchup pipelines interleave row-copy with change-log catch-up rather than running them sequentially.

Seen in

Last updated · 470 distilled / 1,213 read