SYSTEM Cited by 1 source
Netflix Casspactor¶
Casspactor was Netflix's legacy Cassandra-to-Iceberg data-movement engine, processing ~1,200 data movements per day and transferring approximately 3 PB of data from Apache Cassandra into Apache Iceberg tables (Source: sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix).
Architecture¶
Casspactor assembled a composite view of backups from multiple independent systems — each with its own failure modes, update cadences, and accuracy guarantees. It required all nodes in a region to snapshot at the same clock second; a single node replacement could break data movement for an entire region.
Limitations that drove replacement¶
- Fragile metadata dependencies — metadata fell out of sync with actual backups, causing silent stale/incorrect reads.
- Skewed partition failures — could not handle tables with large partitions (common in KV and Time Series workloads), crashing with OOM errors.
- No data-model awareness — moved raw Cassandra tables as-is; abstractions bolted on post-processing.
- Intermediate table bloat — wrote to intermediate Iceberg table; higher-level connectors added more intermediates, compounding storage cost.
- No time travel — unable to restore prior backups after topology or schema changes.
- Monolithic design — built as a single connector, not as an engine for a family of connectors.
Replacement¶
Replaced by the Cassandra Analytics Wrapper + Move Data connector architecture via the Decider Pattern implemented in Maestro. See sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix for the full migration story.
Seen in¶
- sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix — full history, architecture, limitations, and replacement.