CONCEPT
Transformation data model¶
Definition¶
A transformation data model specifies, for each source system, how each of its columns maps (directly or indirectly) to the logical data model of the golden record ().
It is the second required deliverable of any consolidated-style MDM project (the first being the logical data model itself). Without it, the consolidation pipeline has no per-system recipe for turning source records into golden-record contributions.
Direct vs. indirect mappings¶
Zalando defines two mapping types:
- Direct (1-to-1) — one source column maps to exactly one
logical-model column, with no transformation. Example:
System B's
zip_code→Address.postal code. - Indirect (1-to-many or M-to-1) — one or many source
columns require a transformation algorithm to populate one
or many logical-model columns. Example: System A's
address_line_1,_2,_3→ structuredstreet,city,postal codeonAddress. "These columns need to be processed into the MDM system with a transformation algorithm."
Worked example (from the post)¶
System A (legacy, free-text address lines):
address id→ conceptAddress, relationshiphas contact(target)business partner id→ conceptBusiness Partner, relationshiphas contact(source)address_line_1→ conceptAddress(indirect)address_line_2→ conceptAddress(indirect)address_line_3→ conceptAddress(indirect)
System B (structured address fields):
id→ conceptAddress, relationshiphas contact(target)business partner id→ conceptBusiness Partner, relationshiphas contact(source)street→ conceptAddress, attributestreet name(direct)zip_code→ conceptAddress, attributepostal code(direct)city→ conceptAddress, attributecity name(direct)country_code→ conceptAddress, attributecountry code(direct)
The transformation data model emitted for System B would be
a list of 1:1 column copies (except the IDs, which become
relationship FKs). For System A, it would emit a call into a
parsing function over the three address_line columns.
Relation to data lineage¶
Because the transformation data model names every source- column → concept contribution, it doubles as a data-lineage record: any golden- record field can be queried back to every contributing source column across every source system. Zalando names this as a capability that "falls out" of the graph-based approach (Source: ).
Seen in¶
- — Zalando generates the transformation data model from a Neo4j-hosted knowledge graph alongside the logical data model.
Related¶
- concepts/master-data-management — the enclosing discipline
- concepts/logical-data-model — the sibling deliverable (target schema)
- concepts/golden-record — what the transformation populates
- concepts/data-lineage — the side-effect capability
- concepts/knowledge-graph — the substrate the transformation model is generated from
- systems/zalando-mdm-system — canonical wiki instance
- patterns/mapping-driven-schema-generation — the pattern