Skip to content

SYSTEM Cited by 1 source

Netflix Cassandra Analytics Wrapper

The Cassandra Analytics Wrapper is Netflix's internal layer built on top of the open-source Apache Cassandra Analytics library, extended with Netflix's internal backup representation and an S3 client (Source: sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix).

Role in the stack

This is the foundation layer of Netflix's new data-movement architecture — replacing Casspactor. It handles:

  1. Raw data retrieval from backups — reads SSTable files directly from S3, where regularly-scheduled sidecar-driven backups deposit them.
  2. Translation into standard Spark DataFrames — downstream connectors consume a universal DataFrame interface rather than raw SSTable bytes.
  3. Executor-level mutation compaction — moves compaction to Spark executors, handling wide/skewed partitions without excessive shuffling or OOM.
  4. Auto-sizing — dynamically adjusts resources based on source table characteristics.
  5. Time travel — processes schema, cluster topology, and data as a cohesive unit at a point in time.

Connector Factory model

Atop the wrapper sits a Connector Factory (patterns/connector-factory-from-dataframe) — individual data abstractions (Key Value, Time Series, etc.) build model-aware connectors that transform the generic DataFrames, eliminating the costly post-processing of the Casspactor era.

Key improvements over Casspactor

Dimension Casspactor Cassandra Analytics Wrapper
Metadata source Multiple services (fragile) S3 directly (single source of truth)
Wide partitions OOM crashes Executor-level compaction, no excessive shuffle
Intermediate tables Multiple per pipeline Zero — direct DataFrame → Iceberg
Time travel Impossible Schema + topology + data at point-in-time
Resource tuning Manual Auto-sizing

Seen in

Last updated · 546 distilled / 1,578 read