Skip to content

CONCEPT Cited by 1 source

Cassandra CDC commit log

Definition

The Cassandra CDC (Change Data Capture) commit log is the per-node on-disk log of mutations to CDC-enabled tables. External CDC consumers tail these files to extract row-level changes and publish them onto a downstream stream (Kafka, Pulsar, etc.).

Cassandra 3.x vs 4.x semantics

A load-bearing behavioural change was introduced in Cassandra 4:

Version When CDC commit log is written
3.11 and earlier On flush (deferred; batched with memtable flush)
4.x and later As soon as mutations happen on a CDC-enabled table (CASSANDRA-12148)

This is not a superset change — the timing contract a consumer was relying on under 3.x is broken under 4.x. Any CDC consumer that assumed "the commit log lags roughly a memtable flush interval" has to be rewritten to cope with near-real-time arrival under 4.x.

Impact on CDC consumers

For any CDC consumer written against 3.x:

  • Event-timing assumptions may no longer hold.
  • Record-ordering invariants that relied on the flush boundary may need re-derivation.
  • Throughput expectations change — the consumer sees a higher-rate / smaller-batch stream under 4.x.
  • Forward compatibility is not automatic.

Seen in

  • sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade — canonical wiki Seen-in. Yelp's in-house Cassandra Source Connector was not forward-compatible with the 4.x CDC commit-log semantics change and required Yelp to:
  • Port the DataPipeline Materializer to be backward- compatible with both 3.11 and 4.1 and deploy it fleet-wide before any Cassandra node upgrade.
  • Upgrade the CDC Publisher in lockstep with each Cassandra node (it runs as a sidecar container in the same pod).
  • Switch from the Cassandra driver's Schema Change Listener to actively detecting schema changes as commit logs are processed"simplified the CDC Publisher" (Source verbatim).
Last updated · 476 distilled / 1,218 read