Skip to content

PATTERN Cited by 1 source

Connector Factory from DataFrame

Definition

A Connector Factory architecture provides a shared foundation layer that produces a standard interchange format (e.g., Spark DataFrames) from raw storage, then lets each domain-specific connector build its own model-aware transformation on top — without duplicating retrieval logic or accumulating intermediate artifacts.

Canonical example

Netflix's new Cassandra data-movement stack (Source: sources/2026-06-19-netflix-the-evolution-of-cassandra-data-movement-at-netflix):

  1. Foundation: Cassandra Analytics Wrapper reads SSTables from S3 → produces Spark DataFrames.
  2. Connectors: Key Value, Time Series, and other abstractions implement data-model-aware transformations over those DataFrames via Java UDFs and transforms.
  3. Benefit: Improvements to the core engine benefit all connectors; connectors are focused solely on domain transformation.

Contrast with monolithic design

The replaced Casspactor was built as a single connector — there was no shared engine, so every abstraction inherited its constraints and added post-processing intermediates.

Key properties

  • Single retrieval path → eliminates redundant I/O and intermediate storage.
  • Model-awareness → each connector understands its data model semantics, enabling domain-specific optimisations.
  • Shared improvements → engine upgrades (auto-sizing, partition handling) cascade to all connectors automatically.

Seen in

Last updated · 546 distilled / 1,578 read