PATTERN Cited by 1 source
Connector Factory from DataFrame¶
Definition¶
A Connector Factory architecture provides a shared foundation layer that produces a standard interchange format (e.g., Spark DataFrames) from raw storage, then lets each domain-specific connector build its own model-aware transformation on top — without duplicating retrieval logic or accumulating intermediate artifacts.
Canonical example¶
Netflix's new Cassandra data-movement stack (Source: sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix):
- Foundation: Cassandra Analytics Wrapper reads SSTables from S3 → produces Spark DataFrames.
- Connectors: Key Value, Time Series, and other abstractions implement data-model-aware transformations over those DataFrames via Java UDFs and transforms.
- Benefit: Improvements to the core engine benefit all connectors; connectors are focused solely on domain transformation.
Contrast with monolithic design¶
The replaced Casspactor was built as a single connector — there was no shared engine, so every abstraction inherited its constraints and added post-processing intermediates.
Key properties¶
- Single retrieval path → eliminates redundant I/O and intermediate storage.
- Model-awareness → each connector understands its data model semantics, enabling domain-specific optimisations.
- Shared improvements → engine upgrades (auto-sizing, partition handling) cascade to all connectors automatically.
Seen in¶
- sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix — Netflix's Connector Factory for Cassandra-to-Iceberg movement.