CONCEPT Cited by 7 sources
Schema evolution¶
Definition¶
Schema evolution is the problem of changing the structure of data (records, tables, messages) over time while old and new versions coexist in the system — in flight on a queue, at rest in a log, being read by older consumers that have not yet redeployed. It is the hard problem in any long-lived data pipeline: the schema is not a static contract but a moving one with backwards- and forwards-compatibility obligations.
Why it's hard in async CDC¶
In an async CDC pipeline, there are at minimum three clocks that can advance independently:
- The source database's DDL clock (schema migrations applied).
- The in-flight records' serialised schema version (records already written to a Kafka topic under schema v1 cannot be retroactively rewritten when the producer upgrades to v2).
- Each consumer's deploy clock (a sink connector or a custom downstream app may be pinned to v1 when producers switch to v2).
A schema change that looks "trivial" at the DDL layer —
e.g. ALTER TABLE ... ALTER COLUMN foo SET NOT NULL — can
silently break every in-flight record where foo happened to be
null. That null-valued record was valid under v1, was already
serialised and published, and will surface at a consumer that
now expects a non-null.
Datadog's framing and two-layer answer¶
Datadog's 2025-11-04 retrospective names schema evolution as "one of the key challenges with asynchronous replication":
"Even with schema changes in the source datastore, our platform needs to ensure that change events can be reliably replicated downstream." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
Their answer is a two-layer solution:
Layer 1 — validate before apply (offline)¶
An internal automated schema management validation system analyses schema migration SQL before it's applied to the database. It catches pipeline-breaking changes; Datadog's canonical example:
"We would want to block a schema change like
ALTER TABLE ... ALTER COLUMN ... SET NOT NULLbecause not all messages in the pipeline are guaranteed to populate that column. If a consumer gets a message where the field was null, the replication could break. Our validation checks allow us to approve most changes without manual intervention. For breaking changes, we work directly with the team to coordinate a safe rollout." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
This layer is captured as patterns/schema-validation-before-deploy.
Layer 2 — registry-enforced backward compat (runtime)¶
A multi-tenant Kafka Schema Registry integrated with source + sink connectors, configured for backward compatibility — new schemas must still let older consumers read data. In practice this limits schema changes to adding optional fields or removing existing fields. When Debezium captures an updated schema, it serialises data to Avro and pushes data + schema to Kafka topic + Schema Registry; the registry compares against the stored schema and accepts or rejects. This layer is captured as patterns/schema-registry-backward-compat.
Composition¶
Offline validation catches the pipeline-breaking class before it reaches production. Runtime registry catches the residual class that slip through (or pre-deploy review missed). Combined, they allow the source team to ship routine schema migrations without manual platform-team coordination, reserved for the breaking-change class.
Compatibility modes (canonical vocabulary)¶
| Mode | Rule | Allows |
|---|---|---|
| Backward | New schema readable by old consumers | Add optional fields; remove fields |
| Forward | Old schema readable by new consumers | Remove optional fields; add fields with defaults |
| Full | Both directions | Additive optional changes only |
| None | No checking | Anything (unsafe) |
Backward is the default for CDC pipelines because the consumer fleet is typically the slower-to-redeploy side — a producer upgrade must not break in-flight data destined for older-version consumers.
Related¶
- concepts/backward-compatibility — the general principle; schema evolution is its data-contract specialisation.
- concepts/change-data-capture — the pipeline class where schema evolution is most operationally acute.
- systems/kafka-schema-registry — the runtime enforcement mechanism.
- systems/debezium — the CDC producer that emits schema updates alongside data.
- patterns/schema-validation-before-deploy — the offline layer.
- patterns/schema-registry-backward-compat — the runtime layer.
Seen in¶
- sources/2026-05-13-databricks-the-rosetta-stone-of-cps-clarotys-ai-powered-library — Schema evolution + Delta time travel as the audit-chain substrate for an Entity Resolution catalog. The schema- evolution capability is composed with Delta time travel to produce "an unbreakable chain of custody; every asset record is traceable back to its original raw artifact and the specific mapping version that classified it, ensuring full auditability in even the most sensitive industrial environments." New schema-evolution face on the wiki: not just a backward/forward-compatibility story for in-flight records (the canonical Datadog framing) but a **data-lineage
-
classifier-lineage substrate** for ER systems where the classifier (a versioned mapping registry) must itself be auditable through time. Composes with Delta CDF (the layer-transition trigger) and Delta Lake (the substrate). Canonical instance: systems/claroty-cps-library.
-
sources/2026-04-09-redpanda-oracle-cdc-now-available-in-redpanda-connect — seventh canonical schema-evolution axis on the wiki: automatic mid-stream schema tracking via the source DB's data dictionary. The Redpanda Connect
oracledb_cdcconnector queries Oracle'sALL_TAB_COLUMNSdata- dictionary view and attaches the current column schema to each message as metadata, with precision-awareNUMBERmapping (integers asint64, decimals asjson.Number). Composed withschema_registry_encode, this produces typed Avro records in Schema Registry from day one. Drift semantics: "New columns added to a captured table are detected automatically mid-stream. Dropped columns are reflected after a connector restart. Schema drift is handled, not ignored." Inverts the Kafka-Connect-era default where schema-drift handling was the operator's problem. Distinct from the 2026-03-05 Iceberg-output registry-less axis (which infers from raw JSON without a Schema Registry); this axis is registry-with-data-dictionary-as-source-of- truth. Canonical verbatim: "Schema drift is the thing that silently corrupts your downstream data until someone notices a null where they expected a number (usually in production, usually days after the column was added, usually not by you). Most CDC setups leave this problem to you." -
sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform — Datadog's two-layer answer (offline migration-SQL validation + runtime registry in backward-compat mode); the
SET NOT NULLexample as a canonical pipeline-breaking change blocked at offline analysis. -
sources/2024-09-16-lyft-protocol-buffer-design-principles-and-practices — schema-evolution-at-schema-design-time, sibling axis to the Datadog CDC framing. Rather than defending evolution with a registry, Lyft Media's 2024-09-16 post frames protobuf design practices that keep future changes additive by construction:
oneofover discriminator-enum-plus-field (new variants are one-line additions, see patterns/oneof-over-enum-plus-field); reserve0asUNKNOWNso enum growth doesn't silently remap old values (see concepts/unknown-zero-enum-value); explicitoptionallabel or wrapper types so presence is recoverable (see concepts/proto3-explicit-optional); prefergoogle.protobuf.Timestampoveruint64so semantic widening happens inside the well-known type. Canonical statement that proto3 droppedrequiredspecifically because it was a schema-evolution dead-end ("nearly impossible to safely change a required field to be optional"). -
— schema-evolution at the DDL + application deploy boundary, sibling to both the Datadog async-CDC and Lyft schema-design axes. Taylor Barnett (PlanetScale) canonicalises the six-step expand-migrate- contract pattern as the operationalisation of schema evolution for the synchronous-within-a-single-service case — app-code and schema cannot deploy atomically, so the schema must evolve through a sequence of states each of which is backward-compatible with the previous app-code state. Walks rename-column, type- change, split-column, and merge-tables scenarios through the Expand → Dual-write → Backfill → Read-new → Stop-old-writes → Delete sequence. This is a third axis of schema evolution on the wiki: (1) Datadog — async CDC, producer + consumer registry-enforced compatibility; (2) Lyft — compile-time schema design for evolution-additive-by-construction; (3) PlanetScale — deploy-time within-single-service coexistence of old + new schema through a sequenced migration. Also canonicalises MySQL 8.0 invisible columns as an engine-specific deprecation-discovery primitive. Extends the canonical schema-evolution discipline from cross-service contracts into the same-service deploy-sequence domain.
-
sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available — schema-evolution at the streaming-broker ↔ table-format boundary (fourth axis). Redpanda 25.1 Iceberg Topics GA release promises "Seamless schema evolution with full support for all changes allowed by the Iceberg spec, allowing safe field additions, renames, and deletions over time" — meaning the broker translates upstream Kafka-record-schema changes (Avro / JSON Schema / Protobuf) into Iceberg-spec schema- evolution operations on the produced Iceberg table. The Iceberg side of the evolution loop is clean (Iceberg-spec- compliant); the Kafka-serializer-to-Iceberg-schema translation remains operator domain. Retires the wiki's earlier caveat on concepts/iceberg-topic that "how Iceberg-topic schema changes interact with Kafka-client serializers is a source of operational complexity" — for the Iceberg-spec half of the loop.
-
sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse — schema-evolution as a performance trade-off on Snowflake time-series ingest (fifth axis). Redpanda IoT-pipeline tutorial explicitly names "schema evolution may not be ideal in time-series contexts, where performance and retrieval speeds are critical" — a counter-framing to the usual "always turn on schema evolution" recommendation. Rationale: "continuous schema changes can add overhead during ingestion or query processing (especially with large-scale data pipelines)". On stable schemas, turning off
schema_evolutionforces schema validation that "helps you catch errors or inconsistencies in the incoming data before they are stored." The trade-off composes with Snowpipe Streaming channel tuning — time-series workloads also benefit from smaller batches (1,000 records at most) + short periods (10–30 s). Canonical wiki counter-framing to the Datadog registry-enforced-backward-compat default. -
sources/2026-03-05-redpanda-introducing-iceberg-output-for-redpanda-connect — registry-less data-driven schema evolution (sixth axis). Redpanda Connect's 2026-03-05 Iceberg output launch names concepts/registry-less-schema-evolution as the inverse of the Schema-Registry-driven default used by Iceberg Topics — the sink connector senses new fields in an incoming JSON stream and automatically updates the Iceberg table metadata without any registry, DDL, or ops ticket. Framed verbatim as the "best of both worlds" between two named pathologies: chained SMT brittleness ("maintenance toil") and all-columns-as-string tables ("dirty data"). Mechanism depth undisclosed — type inference strategy, conflict handling, rename semantics elided. Composes with multi-table routing to give zero-operator-touch table fan-out. This axis is distinct from the other five: it rejects the contract-first premise the Datadog + Lyft axes assume, and treats the table schema as a reflection of the data rather than a gate on it.
Related (continued)¶
- concepts/coupled-vs-decoupled-database-schema-app-deploy — the deploy-time decoupling framing underlying the expand- migrate-contract operationalisation.
- concepts/mysql-invisible-column — MySQL-specific schema- evolution deprecation-discovery primitive.
- concepts/schema-disagreement — the distributed-datastore failure mode where cluster nodes disagree on schema version; canonicalised on the Yelp Cassandra 4.x upgrade ingest.
- patterns/expand-migrate-contract — the canonical schema-evolution discipline for single-service mutating-schema changes.
- patterns/dual-write-migration — the cross-system generalisation of expand-migrate-contract's Step 2.