Skip to content

PATTERN Cited by 1 source

Mapping-driven schema generation

The pattern

Make the mapping from source schemas to a conceptual layer the authoritative artifact, and derive both (1) the target schema and (2) the transformation code from it. Do not author the target schema directly.

This inverts the traditional workflow — where the target schema is the authoritative deliverable and mappings accrete as textual specifications — because the mapping is the only artifact that natively captures both what to store and how to populate it from sources.

When to use it

  • You have multiple source systems with heterogeneous schemas that must be consolidated into one target schema.
  • Domain experts (not engineers) are the source of truth for the conceptual model.
  • Both the target schema and the per-source transformation code are required deliverables (canonical examples: MDM, data-warehouse ETL, enterprise data-integration platforms).
  • The target schema is expected to evolve; coupling it directly to source-system schemas would make refactoring painful.

Implementations

Why it works

  • Single source of truth. The mapping captures target schema and data provenance simultaneously.
  • Composable over source changes. When a source system gains a new column, the mapping gets one new entry; the target schema and transformation code regenerate without human intervention.
  • Data lineage is free. Every target field is traceable to every contributing source column.
  • Reduces drift. Target-schema-first workflows tend to develop inconsistencies between the schema, the transformation code, and the documentation; mapping-driven generation eliminates the category.

Trade-offs

  • Upfront investment in the mapping language / vocabulary. The conceptual layer must be stable and well-understood before mappings can be authored.
  • Generator quality is load-bearing. Bugs in the generator propagate everywhere. Test coverage of the generator matters more than test coverage of hand-written schemas.
  • Doesn't solve entity resolution. The pattern handles structural mapping; instance-level identity (match-and-merge in MDM, de-duplication in ETL) is orthogonal.
  • Target-schema optimisations are hard. Hand-written schemas can carry ad-hoc indexes, denormalisations, and storage tweaks. Generator-emitted schemas default to a mirror of the conceptual layer; optimisations require generator-side annotations.

Seen in

Last updated · 476 distilled / 1,218 read