PATTERN Cited by 1 source
Schema validation before deploy¶
Summary¶
Analyse database migration SQL before it's applied to the production database, to catch schema changes that would break in-flight records on downstream CDC pipelines. Pre-deploy validation is the offline half of a two-layer schema-evolution-safety answer, paired with runtime Schema Registry backward-compat enforcement.
Problem¶
In an async CDC pipeline, a schema change that looks innocuous at the DDL layer can silently break consumers downstream. The canonical Datadog example:
"We would want to block a schema change like
ALTER TABLE ... ALTER COLUMN ... SET NOT NULLbecause not all messages in the pipeline are guaranteed to populate that column. If a consumer gets a message where the field was null, the replication could break." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
The problem is temporal: the DDL is instant, but the stream of
records under the old schema is long-lived. A runtime
Schema Registry would catch the
new schema as incompatible, but that catch happens at the
registry — the actual DDL may have already landed on the primary,
and rolling back a NOT NULL under traffic is expensive.
Solution¶
Have migration SQL flow through an automated schema-management validation system that is a hard gate before the migration is allowed to run on production. The validator:
- Parses the migration SQL.
- Classifies each statement by its CDC-compatibility impact.
- Auto-approves "safe" changes (additive column, optional field, adding an index).
- Blocks "unsafe" changes (
SET NOT NULLwhere downstream messages might be null; tightening a column type; renaming; dropping a column downstream consumers read). - Routes blocked changes to a coordinated-rollout process with the consuming team.
Datadog's framing:
"Our validation checks allow us to approve most changes without manual intervention. For breaking changes, we work directly with the team to coordinate a safe rollout." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
Why this works¶
- Human-in-the-loop cost stays proportional to the breaking- change rate, not the total migration rate. Most migrations are routine and get auto-approved; the rare pipeline-breaking class is where coordination happens.
- Catches the problem upstream of production state mutation. The migration never lands on the primary; there's nothing to roll back.
- Composes with runtime enforcement as defence in depth — offline catches the predictable class, runtime registry catches whatever slips through.
Composition with schema-registry backward-compat¶
| Layer | Phase | Mechanism | Catches |
|---|---|---|---|
| Pre-deploy (offline) | Before DDL applied | Automated migration-SQL validator | Structural pipeline-breaking changes like SET NOT NULL |
| Runtime (online) | After DDL, at publish | Kafka Schema Registry in backward-compat mode | Schemas Debezium serialises that don't round-trip for older consumers |
Neither layer subsumes the other — they catch different classes of failure at different costs.
Caveats¶
- The validator must be kept in sync with the pipeline topology: if a table becomes newly sourced by CDC, the validator needs to know.
- Rule authoring is a custom-code problem — not every
ALTER TABLEform is easily classified, and over-blocking is its own tax. - The pattern requires the company to own the DDL pipeline (e.g. migration runner in CI, not direct psql by devs).
- For non-SQL sources (Cassandra, MongoDB), the equivalent validator must target the source's DDL/schema surface.
Seen in¶
- sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform
— Datadog's internal automated schema-management validation
system analyses migration SQL before it's applied to the
database; blocks
ALTER TABLE ... ALTER COLUMN ... SET NOT NULLon columns with potentially-null in-flight messages. Auto-approves most changes; breaking changes trigger coordinated rollout with the owning team. Offline half of Datadog's two-layer schema-evolution-safety solution.
Related¶
- concepts/schema-evolution — the concept this pattern defends.
- concepts/backward-compatibility — the compatibility property being preserved.
- concepts/change-data-capture — the pipeline class where the pattern is load-bearing.
- patterns/schema-registry-backward-compat — the runtime companion; defence in depth together.
- patterns/managed-replication-platform — the full platform shape both pre-deploy and runtime schema safety fit into.
- systems/debezium — the CDC producer whose contract the pre-deploy gate protects.