Skip to content

PATTERN Cited by 1 source

Schema validation before deploy

Summary

Analyse database migration SQL before it's applied to the production database, to catch schema changes that would break in-flight records on downstream CDC pipelines. Pre-deploy validation is the offline half of a two-layer schema-evolution-safety answer, paired with runtime Schema Registry backward-compat enforcement.

Problem

In an async CDC pipeline, a schema change that looks innocuous at the DDL layer can silently break consumers downstream. The canonical Datadog example:

"We would want to block a schema change like ALTER TABLE ... ALTER COLUMN ... SET NOT NULL because not all messages in the pipeline are guaranteed to populate that column. If a consumer gets a message where the field was null, the replication could break." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

The problem is temporal: the DDL is instant, but the stream of records under the old schema is long-lived. A runtime Schema Registry would catch the new schema as incompatible, but that catch happens at the registry — the actual DDL may have already landed on the primary, and rolling back a NOT NULL under traffic is expensive.

Solution

Have migration SQL flow through an automated schema-management validation system that is a hard gate before the migration is allowed to run on production. The validator:

  • Parses the migration SQL.
  • Classifies each statement by its CDC-compatibility impact.
  • Auto-approves "safe" changes (additive column, optional field, adding an index).
  • Blocks "unsafe" changes (SET NOT NULL where downstream messages might be null; tightening a column type; renaming; dropping a column downstream consumers read).
  • Routes blocked changes to a coordinated-rollout process with the consuming team.

Datadog's framing:

"Our validation checks allow us to approve most changes without manual intervention. For breaking changes, we work directly with the team to coordinate a safe rollout." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

Why this works

  • Human-in-the-loop cost stays proportional to the breaking- change rate, not the total migration rate. Most migrations are routine and get auto-approved; the rare pipeline-breaking class is where coordination happens.
  • Catches the problem upstream of production state mutation. The migration never lands on the primary; there's nothing to roll back.
  • Composes with runtime enforcement as defence in depth — offline catches the predictable class, runtime registry catches whatever slips through.

Composition with schema-registry backward-compat

Layer Phase Mechanism Catches
Pre-deploy (offline) Before DDL applied Automated migration-SQL validator Structural pipeline-breaking changes like SET NOT NULL
Runtime (online) After DDL, at publish Kafka Schema Registry in backward-compat mode Schemas Debezium serialises that don't round-trip for older consumers

Neither layer subsumes the other — they catch different classes of failure at different costs.

Caveats

  • The validator must be kept in sync with the pipeline topology: if a table becomes newly sourced by CDC, the validator needs to know.
  • Rule authoring is a custom-code problem — not every ALTER TABLE form is easily classified, and over-blocking is its own tax.
  • The pattern requires the company to own the DDL pipeline (e.g. migration runner in CI, not direct psql by devs).
  • For non-SQL sources (Cassandra, MongoDB), the equivalent validator must target the source's DDL/schema surface.

Seen in

  • sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform — Datadog's internal automated schema-management validation system analyses migration SQL before it's applied to the database; blocks ALTER TABLE ... ALTER COLUMN ... SET NOT NULL on columns with potentially-null in-flight messages. Auto-approves most changes; breaking changes trigger coordinated rollout with the owning team. Offline half of Datadog's two-layer schema-evolution-safety solution.
Last updated · 200 distilled / 1,178 read