Skip to content

PATTERN Cited by 1 source

Connector library as protocol abstraction

Connector library as protocol abstraction is the pattern in which a complex protocol (open table format, RPC framework, streaming wire protocol, governance handshake) is encapsulated inside a single, open-source, multi-language library that all consumers integrate against — instead of every consumer re-implementing the protocol from scratch.

The architectural lever: protocol-correctness work happens once, in one place, owned by the upstream project, with consumers focusing on their engine-specific integration (how their engine expresses a write, sink, scan, etc.) rather than on protocol internals.

Canonical instance: Delta Kernel (2026-05-14)

"Delta Kernel — the open source Java and Rust library for reading, writing, and committing to Delta tables — abstracts the low-level protocol details so connector developers can focus on UC integration, not Delta implementation." (sources/2026-05-14-databricks-expanded-interoperability-with-unity-catalog-open-apis)

Three named adopters:

The 2026-05-14 framing of the ecosystem-growth thesis around this pattern: "Apache Spark, Delta Flink, and DuckDB have all leveraged Delta Kernel to support external writes to UC managed tables and integrate with catalog-managed commits, and the ecosystem continues to grow. By handling the low-level protocol complexity, Delta Kernel makes it straightforward for any engine to integrate with Unity Catalog which contributes to a growing ecosystem of connectors."

The two structural failure modes this pattern eliminates

Without a shared connector library, the per-engine implementation shape produces:

  1. Drift across implementations. Each engine's parser drifts in subtle ways from the canonical spec — different default behaviours for schema evolution, different snapshot-resolution edge cases, different transaction-conflict handling. Tables that "just work" in one engine produce silent data corruption in another. The cost of detecting and reconciling drift across N engines is O(N²).

  2. Per-engine connector cost. Every new engine wanting to integrate has to re-implement the protocol from scratch — a multi-engineer-quarter project before any engine-specific integration work begins. This is the structural reason "open table format" historically meant "few engines integrate well."

The connector-library shape collapses both: protocol-correctness is written and tested once in the library; new-engine integration cost shrinks to "wire the engine's API to the library's API" rather than "reimplement the protocol."

Layered architecture

┌──────────────────────────────────────────────────────┐
│ Engine-specific integration                          │
│ (Spark / Flink / DuckDB / new engine X connector)    │
├──────────────────────────────────────────────────────┤
│ Connector library API (e.g., Delta Kernel)           │
│ ─ Snapshot resolution                                │
│ ─ Schema evolution                                   │
│ ─ Commit-log parsing                                 │
│ ─ Transactional write coordination                   │
│ ─ Catalog handshake (e.g., UC catalog commits)       │
├──────────────────────────────────────────────────────┤
│ Wire protocol / on-disk format                       │
│ (Delta log + Parquet data files + catalog commits)   │
└──────────────────────────────────────────────────────┘

The leverage: a clean separation between engine-specific surface (top layer) and protocol-correct execution (middle layer) means the protocol can evolve without requiring every engine to be re-engineered.

When this pattern applies

  • Open protocol with multiple consumer engines — exactly the case the ecosystem-growth thesis targets.
  • Protocol complexity that's high enough to make per-implementation parity expensive — Delta's snapshot resolution, schema evolution, multi-version concurrency, and catalog-handshake all qualify.
  • Multi-language consumer reality — JVM ecosystem (Spark, Flink)
  • native ecosystem (DuckDB, Rust tools). The library being available in Java + Rust specifically (Delta Kernel's case) is the structural answer to this.
  • Strong upstream-project governance willing to take ownership of protocol correctness — the library only works as a single source of truth if the upstream owns the truth.

When this pattern doesn't fit

  • Closed protocol with a single canonical implementation — no ecosystem to abstract over.
  • Performance-critical paths where per-engine custom implementations are warranted — the library's general-purpose shape may not match a specific engine's optimisation needs. The mitigation is typically that the library provides a low-level API for engines that want to specialise.
  • Trivial protocols where re-implementation is cheap — the library's coordination overhead exceeds the avoided implementation cost.

Composition with the catalog and credential vending

Connector-library-as-protocol-abstraction is the engine-side substrate that the broader external-engine-write story rests on. The composition with the catalog-side primitives:

Engine-side (library) Catalog-side
Build write payload (data + delta) via connector library Catalog accepts commit handoff
Hand commit to catalog (not direct log write) Catalog serializes + audits + persists (patterns/catalog-managed-commits-for-external-write-safety)
Detect approaching credential expiry Catalog auto-mints new short-lived scoped credential (patterns/credential-vending-for-external-engine-access)

The library is where the engine-side refresh-loop logic lives, where the catalog-handshake protocol is implemented, and where new engine integrations plug in.

Trade-offs vs alternative shapes

Shape Pro Con
Per-engine reimplementation Each engine optimises for its own performance characteristics. O(N) implementation cost; O(N²) drift detection cost; high integration barrier for new engines.
Vendor-proprietary protocol + sidecar process Vendor controls the protocol entirely; integration via process boundary is uniform. Process-boundary overhead; performance penalty; engine has to ship the sidecar.
Connector library (this pattern) Single source of truth + low new-engine integration cost + ecosystem composability. Library shape may not match every engine's optimisation needs; requires upstream-project governance.

Other instances of this pattern in the wiki corpus

The Delta Kernel disclosure is the canonical 2026-05-14 instance, but the pattern shows up elsewhere:

  • grpc-go / language-specific gRPC libraries wrapping the HTTP/2 + Protobuf wire protocol so service implementations don't hand-roll RPC.
  • Iceberg's pyiceberg / iceberg-java / iceberg-rust — Iceberg's canonical-table-format libraries play the same role for Iceberg as Delta Kernel does for Delta.
  • Various OAuth client libraries abstracting the OAuth dance so every consumer application doesn't reimplement client_credentials flows.
  • Kafka client libraries (librdkafka, language-specific clients) abstracting the Kafka wire protocol.

The shared shape: a complex protocol with a strong upstream owner and many language ecosystems wanting to consume it.

Seen in

Last updated · 542 distilled / 1,571 read