Skip to content

CONCEPT Cited by 1 source

Schema as cross-system contract

Definition

Schema as cross-system contract is the framing that in a distributed data pipeline, a schema is not merely metadata for a single component โ€” it is a binding contract spanning ingestion, transport, transformation, storage, and consumption layers. A schema change at the source propagates obligations across every downstream system that depends on the structural definition.

Why it matters

When schema is treated as "just metadata," teams assume a DDL change at the source is sufficient. In reality, a single column addition may require:

  • CDC source reconfiguration
  • Kafka topic schema update
  • Transformation-code regeneration (Flink, Spark)
  • Storage-layer schema evolution (Iceberg, Delta)
  • Bootstrap/backfill query updates
  • Downstream consumer redeployment

Without treating schema as a cross-system contract, manual updates across these layers produce drift, failed deployments, and inconsistent datasets.

Pinterest's framing

"In a distributed CDC pipeline, schema is not just metadata โ€” it is a cross-system contract spanning ingestion, transformation, storage, and historical backfill. A schema change that is not handled carefully can break Flink jobs, block Spark upserts, or create inconsistencies between online and offline representations." (Source: sources/2026-06-24-pinterest-automated-schema-evolution-in-pinterests-next-generation-db)

Pinterest's solution: automated propagation from a single schema definition file through code generation, PR-based rollout, and phased convergence.

Seen in

Last updated ยท 559 distilled / 1,651 read