Skip to content

PATTERN Cited by 1 source

CDC driver ecosystem

Pattern

Publish one stable, documented vendor-side change-stream API and let every ETL / data-warehouse / data-lake ecosystem write its own driver on top of it, rather than building a connector-per-target-system in-house. The vendor owns a single change-stream primitive (Vitess's VStream gRPC API is the canonical wiki instance); the downstream ecosystems own the integration drivers that make the vendor a first-class source for their particular stack (Debezium for Kafka; Airbyte for Airbyte Cloud; Fivetran for Fivetran Cloud; etc). The vendor may also ship a managed first-party consumer (e.g. PlanetScale Connect) but the API is the primary artefact.

Trade: API-stability cost (every driver depends on it; breaking changes cascade) in exchange for integration reach the vendor couldn't achieve alone. Works when the API is small, stable, and well-documented; breaks when the vendor is tempted to close the API or fork it per-customer.

Why this shape

  • Driver authorship is a specialisation. Building a high-quality Kafka-Connect source connector requires deep Kafka-Connect expertise (offset tracking, single-message transforms, rebalancing, exactly-once semantics); building an Airbyte source connector requires deep Airbyte expertise; building a Fivetran source likewise. The vendor can't staff for all of these.
  • Ecosystems want breadth, not depth. Airbyte's value proposition is "connect any source to any destination" — they'll happily write a source connector for any vendor that exposes a usable API. The ROI for Airbyte writing a Vitess source is their platform gaining one more source; the ROI for PlanetScale is gaining every Airbyte destination the drive reaches.
  • The vendor's internal consumer validates the API. Vitess dogfoods VStream through its own VReplication workflows (MoveTables, Reshard, Materialize) before external drivers ever touch it. This is the opposite of the "API exists but no one uses it internally" trap.

Canonical instance

Vitess VStream is the canonical wiki instance. One gRPC primitive; four driver ecosystems riding on it:

The driver-ecosystem pattern is what makes the 2024-07-29 post's closing advice load-bearing: "in setting these kinds of systems up you would use a Vitess variant of the connector/driver rather than the MySQL one." Each ecosystem has a Vitess-native driver because the ecosystem pattern incentivised writing one.

Compare with

  • **[Debezium
  • Kafka-Connect CDC pipeline](<./debezium-kafka-connect-cdc-pipeline.md>)** — the specific Kafka-ecosystem instance of this pattern. Debezium itself is the driver-ecosystem for every database it supports — Postgres, MySQL, MongoDB, Cassandra, Vitess — each via a per-database connector.
  • Kafka Connect — the host framework under which Debezium source connectors and sink connectors compose. The driver-ecosystem pattern is what fills its connector catalogue.

Anti-pattern

The alternative posture is "vendor ships their own connector to every target system." This is what you get when an API is undocumented / unstable / closed; every integration becomes vendor-authored. Three failure modes:

  1. Connector set scales with vendor headcount, not ecosystem size. Small vendors ship a handful of connectors; large ecosystems have hundreds.
  2. Quality varies because each connector is a side project. Ecosystem-owned connectors benefit from ecosystem-wide QA + test harness + release cadence.
  3. Lock-in is a symptom: the vendor becomes the critical path to every integration, slowing customer adoption of adjacent tools.

Publishing a stable API and letting the ecosystems build their own drivers sidesteps all three.

Seen in

  • sources/2026-04-21-planetscale-building-data-pipelines-with-vitesscanonical wiki instance. Matt Lord names all three external driver ecosystems (Debezium, Airbyte, Fivetran) plus PlanetScale Connect as consumers of the VStream API: "This low-level VStream primitive is then used by popular CDC tools like Debezium to capture changes in Vitess and propagate them to other systems. PlanetScale also uses the VStream API to build the Connect feature, using additional open source drivers for popular CDC/ETL services such as Airbyte and Fivetran." Canonical wiki framing of single-vendor-API + N-ecosystem-driver as the shape that lets a sharding-layer-native CDC primitive reach the full ETL / data-warehouse / data-lake integration surface without the vendor writing per-target connectors in-house.
Last updated · 347 distilled / 1,201 read