SYSTEM Cited by 1 source
Kafka Connect¶
Kafka Connect is the Apache Kafka-native distributed framework for streaming data in and out of Kafka via connectors — plugins that wrap the source-or-sink-specific I/O logic behind a uniform lifecycle, configuration, rebalancing, and offset-tracking surface. A source connector reads from an external system and publishes to Kafka topics; a sink connector reads from Kafka topics and writes to an external system. Debezium is implemented as a family of source connectors on Kafka Connect.
Stub page — expand on future Kafka-Connect-internals sources. The canonical wiki use case is Datadog's managed multi-tenant CDC replication platform, where Kafka Connect is "the backbone for scalable, fault-tolerant data movement between systems".
Role in the Datadog CDC platform¶
"Kafka Connect serves as the backbone for scalable, fault-tolerant data movement between systems." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
Concretely at Datadog:
- Source side: Debezium source connectors on Kafka Connect tail Postgres / Cassandra replication logs and publish Avro- serialised CDC records to Kafka topics.
- Sink side: Kafka Connect sink connectors read from those topics and push records into Elasticsearch (the original pipeline), another Postgres, Iceberg tables, Cassandra, or another Kafka cluster (cross-region).
- Custom forks: Datadog maintains in-house forks of the upstream connectors "to introduce Datadog-specific logic and optimisations" where out-of-the-box customisations fell short.
Single-Message Transforms (SMTs)¶
Kafka Connect exposes a connector-level chain of single-message transforms that operate on records flowing through a connector — without requiring any change at the source system. Datadog's retrospective names the SMT kinds they used most:
| Transform class | Effect |
|---|---|
| Dynamic topic rename | Rewrite destination topic per record |
| Column type conversion | Change field types inline |
| Composite primary key generation | Concatenate multiple fields into a single key |
| Add column | Inject derived or static columns at record level |
| Drop column | Remove unwanted columns before the downstream sink |
SMTs are the per-tenant customisation surface that let Datadog avoid forking the pipeline for every variant requirement:
"This flexibility made it possible for teams to shape their data according to their specific needs without requiring changes at the source." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
See patterns/connector-transformations-plus-enrichment-api for the two-axis customisation shape (SMTs at the transport layer + a centralised enrichment API at the storage layer).
Seen in¶
- sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform — backbone of Datadog's managed multi-tenant CDC replication platform; source side hosts Debezium connectors; sink side pushes into Elasticsearch / Postgres / Iceberg / Cassandra / cross-region Kafka. Single-message transforms are the per-tenant customisation surface; Datadog maintains custom forks for advanced use cases.
Related¶
- systems/kafka — the underlying broker.
- systems/debezium — Debezium's source connectors are Kafka Connect plugins.
- systems/kafka-schema-registry — schemas on records flowing through sinks are validated at the registry under backward- compat rules.
- concepts/change-data-capture — the class of pipeline Datadog uses Kafka Connect for.
- patterns/debezium-kafka-connect-cdc-pipeline — full-stack CDC pattern.
- patterns/connector-transformations-plus-enrichment-api — two-axis per-tenant customisation shape.