SYSTEM Cited by 1 source
Netflix Cassandra Analytics Wrapper¶
The Cassandra Analytics Wrapper is Netflix's internal layer built on top of the open-source Apache Cassandra Analytics library, extended with Netflix's internal backup representation and an S3 client (Source: sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix).
Role in the stack¶
This is the foundation layer of Netflix's new data-movement architecture — replacing Casspactor. It handles:
- Raw data retrieval from backups — reads SSTable files directly from S3, where regularly-scheduled sidecar-driven backups deposit them.
- Translation into standard Spark DataFrames — downstream connectors consume a universal DataFrame interface rather than raw SSTable bytes.
- Executor-level mutation compaction — moves compaction to Spark executors, handling wide/skewed partitions without excessive shuffling or OOM.
- Auto-sizing — dynamically adjusts resources based on source table characteristics.
- Time travel — processes schema, cluster topology, and data as a cohesive unit at a point in time.
Connector Factory model¶
Atop the wrapper sits a Connector Factory (patterns/connector-factory-from-dataframe) — individual data abstractions (Key Value, Time Series, etc.) build model-aware connectors that transform the generic DataFrames, eliminating the costly post-processing of the Casspactor era.
Key improvements over Casspactor¶
| Dimension | Casspactor | Cassandra Analytics Wrapper |
|---|---|---|
| Metadata source | Multiple services (fragile) | S3 directly (single source of truth) |
| Wide partitions | OOM crashes | Executor-level compaction, no excessive shuffle |
| Intermediate tables | Multiple per pipeline | Zero — direct DataFrame → Iceberg |
| Time travel | Impossible | Schema + topology + data at point-in-time |
| Resource tuning | Manual | Auto-sizing |
Seen in¶
- sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix — full architecture description, comparison to Casspactor, migration details.