CONCEPT Cited by 1 source
Schema evolution¶
Definition¶
Schema evolution is the problem of changing the structure of data (records, tables, messages) over time while old and new versions coexist in the system — in flight on a queue, at rest in a log, being read by older consumers that have not yet redeployed. It is the hard problem in any long-lived data pipeline: the schema is not a static contract but a moving one with backwards- and forwards-compatibility obligations.
Why it's hard in async CDC¶
In an async CDC pipeline, there are at minimum three clocks that can advance independently:
- The source database's DDL clock (schema migrations applied).
- The in-flight records' serialised schema version (records already written to a Kafka topic under schema v1 cannot be retroactively rewritten when the producer upgrades to v2).
- Each consumer's deploy clock (a sink connector or a custom downstream app may be pinned to v1 when producers switch to v2).
A schema change that looks "trivial" at the DDL layer —
e.g. ALTER TABLE ... ALTER COLUMN foo SET NOT NULL — can
silently break every in-flight record where foo happened to be
null. That null-valued record was valid under v1, was already
serialised and published, and will surface at a consumer that
now expects a non-null.
Datadog's framing and two-layer answer¶
Datadog's 2025-11-04 retrospective names schema evolution as "one of the key challenges with asynchronous replication":
"Even with schema changes in the source datastore, our platform needs to ensure that change events can be reliably replicated downstream." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
Their answer is a two-layer solution:
Layer 1 — validate before apply (offline)¶
An internal automated schema management validation system analyses schema migration SQL before it's applied to the database. It catches pipeline-breaking changes; Datadog's canonical example:
"We would want to block a schema change like
ALTER TABLE ... ALTER COLUMN ... SET NOT NULLbecause not all messages in the pipeline are guaranteed to populate that column. If a consumer gets a message where the field was null, the replication could break. Our validation checks allow us to approve most changes without manual intervention. For breaking changes, we work directly with the team to coordinate a safe rollout." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
This layer is captured as patterns/schema-validation-before-deploy.
Layer 2 — registry-enforced backward compat (runtime)¶
A multi-tenant Kafka Schema Registry integrated with source + sink connectors, configured for backward compatibility — new schemas must still let older consumers read data. In practice this limits schema changes to adding optional fields or removing existing fields. When Debezium captures an updated schema, it serialises data to Avro and pushes data + schema to Kafka topic + Schema Registry; the registry compares against the stored schema and accepts or rejects. This layer is captured as patterns/schema-registry-backward-compat.
Composition¶
Offline validation catches the pipeline-breaking class before it reaches production. Runtime registry catches the residual class that slip through (or pre-deploy review missed). Combined, they allow the source team to ship routine schema migrations without manual platform-team coordination, reserved for the breaking-change class.
Compatibility modes (canonical vocabulary)¶
| Mode | Rule | Allows |
|---|---|---|
| Backward | New schema readable by old consumers | Add optional fields; remove fields |
| Forward | Old schema readable by new consumers | Remove optional fields; add fields with defaults |
| Full | Both directions | Additive optional changes only |
| None | No checking | Anything (unsafe) |
Backward is the default for CDC pipelines because the consumer fleet is typically the slower-to-redeploy side — a producer upgrade must not break in-flight data destined for older-version consumers.
Related¶
- concepts/backward-compatibility — the general principle; schema evolution is its data-contract specialisation.
- concepts/change-data-capture — the pipeline class where schema evolution is most operationally acute.
- systems/kafka-schema-registry — the runtime enforcement mechanism.
- systems/debezium — the CDC producer that emits schema updates alongside data.
- patterns/schema-validation-before-deploy — the offline layer.
- patterns/schema-registry-backward-compat — the runtime layer.
Seen in¶
- sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform
— Datadog's two-layer answer (offline migration-SQL
validation + runtime registry in backward-compat mode); the
SET NOT NULLexample as a canonical pipeline-breaking change blocked at offline analysis.