PATTERN Cited by 1 source

Schema Registry backward-compat¶

Summary¶

Integrate a Kafka Schema Registry into both source (producer) and sink (consumer) connectors of a CDC pipeline, and configure it in backward compatibility mode. Every schema update proposed by the producer (typically Debezium when the source DB's DDL changes) is compared against the stored schema; updates that would break older consumers are rejected at the registry, not at the downstream sink.

Problem¶

Producers and consumers in an async CDC pipeline redeploy on independent clocks. A producer upgrade that changes the record shape can silently break consumers still running the prior version. Even with pre-deploy migration validation as the offline gate, some schema changes slip through — an Avro-level incompatibility not visible in the SQL, a custom connector's serialisation quirk, a human bypass of the CI pipeline. A runtime gate catches the residual class.

Solution¶

Enforce that every record on the CDC topic is serialised through the Schema Registry — no producer publishes without registering its schema first. Avro is the canonical format.
Configure the registry in backward compatibility mode:

"New schemas must still allow older consumers to read data without errors. In practice, this limits schema changes to safe operations — like adding optional fields or removing existing ones." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

When Debezium captures a source-DB schema change:
Serialise the new records to Avro.
Push both records and schema update to the Kafka topic + Schema Registry.
Registry compares against the stored schema; accepts or rejects.
Consumers (including custom external consumers) fetch the schema by ID from the registry when deserialising; both internal-platform and external consumers share the contract.

Why backward (not forward or full)¶

In CDC pipelines the consumer fleet is usually the slower-to- redeploy side — a producer (Debezium) upgrade must not break in-flight records already destined for older-version consumers. Backward compat ensures new-producer records remain readable by old consumers.

Forward compat (old records readable by new consumers) is the right default in the opposite shape — e.g. log-aggregation where producers are many dumb-clients and consumers upgrade centrally. Full compat is the intersection; None is unsafe.

Multi-tenancy¶

Datadog runs the registry as a multi-tenant service, integrated across teams' pipelines. External custom consumers outside the platform also use the same registry — important because:

"Since users can also build custom Kafka consumers to directly read the topics, maintaining schema compatibility is especially important — we want to ensure that all consumers, whether internal or external, continue to work without disruption." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

Composition with pre-deploy validation¶

This pattern is the runtime half of a two-layer schema- evolution-safety answer:

Layer	Phase	Mechanism	Catches
Pre-deploy (offline)	Before DDL applied	Migration-SQL validator	Structural breakers like `SET NOT NULL` on potentially-null columns
Runtime (online)	At publish	Schema Registry backward-compat	Avro-level incompatibilities, bypass cases, connector-specific quirks

See patterns/schema-validation-before-deploy for the offline half.

Benefits¶

Hard gate at the runtime boundary — incompatible schemas can't enter the topic, so consumers can't see them.
Decoupling — producer and consumer redeploys stay independent, because the registry promises their contract.
Ecosystem fit — every serious Avro + Kafka + Debezium deployment already has a Schema Registry; this is adopting the ecosystem default, not inventing new infra.

Caveats¶

Backward compat constrains schema evolution to a subset — field additions must be optional; field removals must tolerate old records referencing them. Renames are disallowed. A genuinely breaking change (type narrowing, required-field addition) requires a coordinated multi-version migration.
Registry is an operational dependency — its availability must match the producer's availability budget, or producers can't publish.
Schema drift observability must be independently built — the registry records what schemas exist, but querying "which topic has which producer at which schema version right now" is not native.
Multi-tenancy requires auth / namespace design; Datadog's post doesn't disclose the specifics.

Seen in¶

sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform — Datadog's multi-tenant Kafka Schema Registry in backward-compat mode, integrated with source + sink connectors across their managed CDC replication platform; Avro-serialised data + schema updates pushed together by Debezium; external custom consumers share the contract with in-platform sinks.

systems/kafka-schema-registry — the system this pattern operates on.
systems/debezium — the CDC producer emitting schema updates.
systems/kafka — the substrate.
systems/kafka-connect — the connector framework whose source + sink connectors integrate with the registry.
concepts/schema-evolution — the concept this pattern enforces.
concepts/backward-compatibility — the chosen mode.
patterns/schema-validation-before-deploy — the offline companion layer.
patterns/managed-replication-platform — the full platform shape both schema-safety patterns fit into.
patterns/debezium-kafka-connect-cdc-pipeline — the CDC transport backbone.