CONCEPT Cited by 1 source
Schema as cross-system contract¶
Definition¶
Schema as cross-system contract is the framing that in a distributed data pipeline, a schema is not merely metadata for a single component โ it is a binding contract spanning ingestion, transport, transformation, storage, and consumption layers. A schema change at the source propagates obligations across every downstream system that depends on the structural definition.
Why it matters¶
When schema is treated as "just metadata," teams assume a DDL change at the source is sufficient. In reality, a single column addition may require:
- CDC source reconfiguration
- Kafka topic schema update
- Transformation-code regeneration (Flink, Spark)
- Storage-layer schema evolution (Iceberg, Delta)
- Bootstrap/backfill query updates
- Downstream consumer redeployment
Without treating schema as a cross-system contract, manual updates across these layers produce drift, failed deployments, and inconsistent datasets.
Pinterest's framing¶
"In a distributed CDC pipeline, schema is not just metadata โ it is a cross-system contract spanning ingestion, transformation, storage, and historical backfill. A schema change that is not handled carefully can break Flink jobs, block Spark upserts, or create inconsistencies between online and offline representations." (Source: sources/2026-06-24-pinterest-automated-schema-evolution-in-pinterests-next-generation-db)
Pinterest's solution: automated propagation from a single schema definition file through code generation, PR-based rollout, and phased convergence.
Seen in¶
- sources/2026-06-24-pinterest-automated-schema-evolution-in-pinterests-next-generation-db โ canonical framing and automation system