Skip to content

SYSTEM Cited by 1 source

Netflix Casspactor

Casspactor was Netflix's legacy Cassandra-to-Iceberg data-movement engine, processing ~1,200 data movements per day and transferring approximately 3 PB of data from Apache Cassandra into Apache Iceberg tables (Source: sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix).

Architecture

Casspactor assembled a composite view of backups from multiple independent systems — each with its own failure modes, update cadences, and accuracy guarantees. It required all nodes in a region to snapshot at the same clock second; a single node replacement could break data movement for an entire region.

Limitations that drove replacement

  1. Fragile metadata dependencies — metadata fell out of sync with actual backups, causing silent stale/incorrect reads.
  2. Skewed partition failures — could not handle tables with large partitions (common in KV and Time Series workloads), crashing with OOM errors.
  3. No data-model awareness — moved raw Cassandra tables as-is; abstractions bolted on post-processing.
  4. Intermediate table bloat — wrote to intermediate Iceberg table; higher-level connectors added more intermediates, compounding storage cost.
  5. No time travel — unable to restore prior backups after topology or schema changes.
  6. Monolithic design — built as a single connector, not as an engine for a family of connectors.

Replacement

Replaced by the Cassandra Analytics Wrapper + Move Data connector architecture via the Decider Pattern implemented in Maestro. See sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix for the full migration story.

Seen in

Last updated · 546 distilled / 1,578 read