Skip to content

SYSTEM Cited by 1 source

Cassandra Source Connector (Yelp)

The Cassandra Source Connector (CSource) is a Yelp in-house CDC system that streams row-level changes from Cassandra into Yelp's data pipeline, an abstraction layered on top of Kafka. The architecture is documented in Yelp's 2019 blog series (engineeringblog.yelp.com/2019/12/cassandra-source-connector-part-1.html).

Components

The connector has two distinct components:

  1. CDC Publisher — reads the Cassandra CDC commit logs on the Cassandra node (runs as a sidecar container in the same pod as the Cassandra node on Yelp's Kubernetes deployment) and publishes row-level mutations.
  2. DataPipeline Materializer — the downstream subcomponent that materialises events into the Yelp data-pipeline abstraction.

Cassandra 4.x forward-compatibility problem

Two load-bearing Cassandra-4 changes broke the connector as it stood on 3.11:

  1. CDC commit-log write point moved from flush-time to mutation-time (CASSANDRA-12148). Under 3.11 the commit log was only written at flush; under 4.x it's written as mutations happen, changing the timing contract the connector was relying on.
  2. Cassandra 4.1 codebase refactor. Significant internal restructuring the connector's internals depended on.

Additionally (not strictly required but selected by Yelp during the port): switch from the Cassandra driver's Schema Change Listener to actively detecting schema changes as commit logs are processed"simplified the CDC Publisher" (Source: sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade).

Upgrade rollout shape

Yelp decomposed the rollout of the connector from the Cassandra upgrade:

  • DataPipeline Materializer was made backward-compatible with both 3.11 and 4.1 and shipped fleet-wide before any Cassandra upgrade started — a dependency Yelp resolved ahead of the critical path.
  • CDC Publisher was upgraded in lockstep with each Cassandra node — it runs in the same pod as the node, so replacing the node image also replaced the publisher container.

This separation — ship the static-side component ahead + move the node-local component with the node — is the core trick that kept the CDC stream healthy through a fleet-wide Cassandra major version bump.

Seen in

Last updated · 476 distilled / 1,218 read