CONCEPT Cited by 1 source
Cassandra CDC commit log¶
Definition¶
The Cassandra CDC (Change Data Capture) commit log is the per-node on-disk log of mutations to CDC-enabled tables. External CDC consumers tail these files to extract row-level changes and publish them onto a downstream stream (Kafka, Pulsar, etc.).
Cassandra 3.x vs 4.x semantics¶
A load-bearing behavioural change was introduced in Cassandra 4:
| Version | When CDC commit log is written |
|---|---|
| 3.11 and earlier | On flush (deferred; batched with memtable flush) |
| 4.x and later | As soon as mutations happen on a CDC-enabled table (CASSANDRA-12148) |
This is not a superset change — the timing contract a consumer was relying on under 3.x is broken under 4.x. Any CDC consumer that assumed "the commit log lags roughly a memtable flush interval" has to be rewritten to cope with near-real-time arrival under 4.x.
Impact on CDC consumers¶
For any CDC consumer written against 3.x:
- Event-timing assumptions may no longer hold.
- Record-ordering invariants that relied on the flush boundary may need re-derivation.
- Throughput expectations change — the consumer sees a higher-rate / smaller-batch stream under 4.x.
- Forward compatibility is not automatic.
Seen in¶
- sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade — canonical wiki Seen-in. Yelp's in-house Cassandra Source Connector was not forward-compatible with the 4.x CDC commit-log semantics change and required Yelp to:
- Port the DataPipeline Materializer to be backward- compatible with both 3.11 and 4.1 and deploy it fleet-wide before any Cassandra node upgrade.
- Upgrade the CDC Publisher in lockstep with each Cassandra node (it runs as a sidecar container in the same pod).
- Switch from the Cassandra driver's Schema Change Listener to actively detecting schema changes as commit logs are processed — "simplified the CDC Publisher" (Source verbatim).
Related¶
- systems/apache-cassandra — the source system.
- systems/cassandra-source-connector — Yelp's consumer.
- concepts/change-data-capture — general CDC concept.
- concepts/schema-evolution — the schema dimension of cross-version CDC handling.