SYSTEM Cited by 1 source
Cassandra Source Connector (Yelp)¶
The Cassandra Source Connector (CSource) is a Yelp in-house CDC system that streams row-level changes from Cassandra into Yelp's data pipeline, an abstraction layered on top of Kafka. The architecture is documented in Yelp's 2019 blog series (engineeringblog.yelp.com/2019/12/cassandra-source-connector-part-1.html).
Components¶
The connector has two distinct components:
- CDC Publisher — reads the Cassandra CDC commit logs on the Cassandra node (runs as a sidecar container in the same pod as the Cassandra node on Yelp's Kubernetes deployment) and publishes row-level mutations.
- DataPipeline Materializer — the downstream subcomponent that materialises events into the Yelp data-pipeline abstraction.
Cassandra 4.x forward-compatibility problem¶
Two load-bearing Cassandra-4 changes broke the connector as it stood on 3.11:
- CDC commit-log write point moved from flush-time to
mutation-time
(
CASSANDRA-12148). Under 3.11 the commit log was only written at flush; under 4.x it's written as mutations happen, changing the timing contract the connector was relying on. - Cassandra 4.1 codebase refactor. Significant internal restructuring the connector's internals depended on.
Additionally (not strictly required but selected by Yelp during the port): switch from the Cassandra driver's Schema Change Listener to actively detecting schema changes as commit logs are processed — "simplified the CDC Publisher" (Source: sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade).
Upgrade rollout shape¶
Yelp decomposed the rollout of the connector from the Cassandra upgrade:
- DataPipeline Materializer was made backward-compatible with both 3.11 and 4.1 and shipped fleet-wide before any Cassandra upgrade started — a dependency Yelp resolved ahead of the critical path.
- CDC Publisher was upgraded in lockstep with each Cassandra node — it runs in the same pod as the node, so replacing the node image also replaced the publisher container.
This separation — ship the static-side component ahead + move the node-local component with the node — is the core trick that kept the CDC stream healthy through a fleet-wide Cassandra major version bump.
Seen in¶
- sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade — canonical wiki Seen-in. Architecture summary + forward-compat breakage specifics + split-rollout shape.
Related¶
- systems/apache-cassandra — the source system.
- systems/kafka — the transport underlying Yelp's data pipeline abstraction.
- concepts/cassandra-cdc-commit-log — the Cassandra-internal primitive the Publisher reads.
- concepts/change-data-capture — general CDC concept page.
- patterns/version-specific-images-per-git-branch — same Yelp upgrade-rollout discipline applied to Cassandra itself.