Skip to content

SYSTEM Cited by 1 source

Kafka Connect

Kafka Connect is the Apache Kafka-native distributed framework for streaming data in and out of Kafka via connectors — plugins that wrap the source-or-sink-specific I/O logic behind a uniform lifecycle, configuration, rebalancing, and offset-tracking surface. A source connector reads from an external system and publishes to Kafka topics; a sink connector reads from Kafka topics and writes to an external system. Debezium is implemented as a family of source connectors on Kafka Connect.

Stub page — expand on future Kafka-Connect-internals sources. The canonical wiki use case is Datadog's managed multi-tenant CDC replication platform, where Kafka Connect is "the backbone for scalable, fault-tolerant data movement between systems".

Role in the Datadog CDC platform

"Kafka Connect serves as the backbone for scalable, fault-tolerant data movement between systems." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

Concretely at Datadog:

  • Source side: Debezium source connectors on Kafka Connect tail Postgres / Cassandra replication logs and publish Avro- serialised CDC records to Kafka topics.
  • Sink side: Kafka Connect sink connectors read from those topics and push records into Elasticsearch (the original pipeline), another Postgres, Iceberg tables, Cassandra, or another Kafka cluster (cross-region).
  • Custom forks: Datadog maintains in-house forks of the upstream connectors "to introduce Datadog-specific logic and optimisations" where out-of-the-box customisations fell short.

Single-Message Transforms (SMTs)

Kafka Connect exposes a connector-level chain of single-message transforms that operate on records flowing through a connector — without requiring any change at the source system. Datadog's retrospective names the SMT kinds they used most:

Transform class Effect
Dynamic topic rename Rewrite destination topic per record
Column type conversion Change field types inline
Composite primary key generation Concatenate multiple fields into a single key
Add column Inject derived or static columns at record level
Drop column Remove unwanted columns before the downstream sink

SMTs are the per-tenant customisation surface that let Datadog avoid forking the pipeline for every variant requirement:

"This flexibility made it possible for teams to shape their data according to their specific needs without requiring changes at the source." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

See patterns/connector-transformations-plus-enrichment-api for the two-axis customisation shape (SMTs at the transport layer + a centralised enrichment API at the storage layer).

Seen in

  • sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform — backbone of Datadog's managed multi-tenant CDC replication platform; source side hosts Debezium connectors; sink side pushes into Elasticsearch / Postgres / Iceberg / Cassandra / cross-region Kafka. Single-message transforms are the per-tenant customisation surface; Datadog maintains custom forks for advanced use cases.
Last updated · 200 distilled / 1,178 read