SYSTEM Cited by 1 source
Debezium¶
Debezium is an open-source Change Data Capture (CDC) platform built on top of Apache Kafka Connect. It exposes per-database source connectors that tail a database's replication log (Postgres logical replication / MySQL binlog / MongoDB oplog / Cassandra commit log / …) and emit the change stream as keyed Kafka records, typically serialised as Avro with an associated schema published to a Kafka Schema Registry.
Stub page — expand on future Debezium-internals sources. The canonical wiki use case is Datadog's managed multi-tenant CDC replication platform, where Debezium is the source-side ingestion component of the Postgres/Cassandra → Kafka → sink-connector → Elasticsearch/Iceberg/Postgres pipeline.
Role in a CDC pipeline¶
Source DB ──(replication log: Postgres WAL / MySQL binlog / ...)──▶
Debezium source connector ──(Avro-serialised keyed records)──▶
Kafka topic + Kafka Schema Registry ──▶
Sink connector (Kafka Connect) ──▶ downstream system
Debezium serialises record values and their schemas together; schemas are pushed to the Schema Registry and compared against the stored schema under whatever compatibility mode is configured (Datadog runs backward compatibility — see patterns/schema-registry-backward-compat).
Prerequisites on the source database¶
For Postgres specifically, a Debezium source pipeline requires several operator-side configurations that Datadog explicitly enumerated in their 2025-11-04 retrospective:
- Enable logical replication by setting
wal_level=logical. - Create + configure Postgres users with the correct replication permissions.
- Establish replication objects — publishers (logical publications) and replication slots.
- Deploy Debezium instances (one or more, typically one per source database or shard) to capture changes.
- Create Kafka topics with appropriate partitioning and ensure each Debezium instance maps correctly to its topics.
- Set up heartbeat tables — a known Debezium requirement to advance the replication slot's LSN during quiet periods so the Postgres WAL can be recycled (unacked slots pin WAL indefinitely).
- Configure sink connectors (the downstream half) to move data from Kafka into the target system.
(Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
That 7-step manual runbook is the class of operational-complexity problem that motivated Datadog to build a Temporal- orchestrated automation layer over Debezium provisioning.
Schema evolution¶
Debezium emits schema updates alongside data when a source-DB schema change is captured; in Datadog's platform these are serialised to Avro and the schema is compared against the Schema Registry's stored schema under backward-compatibility rules. This limits schema changes to safe operations (adding optional fields, removing existing optional fields) without breaking downstream consumers.
The companion offline layer is Datadog's internal automated
schema management validation system which analyses migration SQL
before it's applied to the database to catch
pipeline-breaking changes (e.g. ALTER TABLE ... ALTER COLUMN
... SET NOT NULL on a column that in-flight Debezium messages
might not populate). See patterns/schema-validation-before-deploy.
Seen in¶
- sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform — core ingestion component of Datadog's managed multi-tenant CDC replication platform. Postgres-to-Elasticsearch was the seed pipeline; the platform generalised to Postgres→Postgres, Postgres→Iceberg, Cassandra→X, and cross-region Kafka replication. Datadog maintains custom forks of Debezium/Kafka-Connect where OSS transforms fell short, to introduce Datadog-specific logic and optimisations.
Related¶
- systems/kafka-connect — Debezium is implemented as a set of Kafka Connect source connectors.
- systems/kafka-schema-registry — where Debezium-emitted schemas are validated under backward-compat rules.
- systems/kafka — destination for CDC records.
- systems/postgresql — primary source database in Datadog's
platform;
wal_level=logicalis the feature gate. - concepts/change-data-capture — the upstream concept.
- concepts/wal-write-ahead-logging — Postgres logical
replication ≡ WAL streaming under
wal_level=logical. - concepts/logical-replication — row-level replication derived from the WAL; Debezium's Postgres feed.
- patterns/debezium-kafka-connect-cdc-pipeline — the full-stack pattern.