CONCEPT Cited by 1 source
Transformation data model¶
Definition¶
A transformation data model specifies, for each source system, how each of its columns maps (directly or indirectly) to the logical data model of the golden record (sources/2021-07-28-zalando-knowledge-graph-technologies-accelerate-and-improve-the-data-model-definition).
It is the second required deliverable of any consolidated-style MDM project (the first being the logical data model itself). Without it, the consolidation pipeline has no per-system recipe for turning source records into golden-record contributions.
Direct vs. indirect mappings¶
Zalando defines two mapping types:
- Direct (1-to-1) — one source column maps to exactly one
logical-model column, with no transformation. Example:
System B's
zip_code→Address.postal code. - Indirect (1-to-many or M-to-1) — one or many source
columns require a transformation algorithm to populate one
or many logical-model columns. Example: System A's
address_line_1,_2,_3→ structuredstreet,city,postal codeonAddress. "These columns need to be processed into the MDM system with a transformation algorithm."
Worked example (from the post)¶
System A (legacy, free-text address lines):
address id→ conceptAddress, relationshiphas contact(target)business partner id→ conceptBusiness Partner, relationshiphas contact(source)address_line_1→ conceptAddress(indirect)address_line_2→ conceptAddress(indirect)address_line_3→ conceptAddress(indirect)
System B (structured address fields):
id→ conceptAddress, relationshiphas contact(target)business partner id→ conceptBusiness Partner, relationshiphas contact(source)street→ conceptAddress, attributestreet name(direct)zip_code→ conceptAddress, attributepostal code(direct)city→ conceptAddress, attributecity name(direct)country_code→ conceptAddress, attributecountry code(direct)
The transformation data model emitted for System B would be
a list of 1:1 column copies (except the IDs, which become
relationship FKs). For System A, it would emit a call into a
parsing function over the three address_line columns.
Relation to data lineage¶
Because the transformation data model names every source- column → concept contribution, it doubles as a data-lineage record: any golden- record field can be queried back to every contributing source column across every source system. Zalando names this as a capability that "falls out" of the graph-based approach (Source: sources/2021-07-28-zalando-knowledge-graph-technologies-accelerate-and-improve-the-data-model-definition).
Seen in¶
- sources/2021-07-28-zalando-knowledge-graph-technologies-accelerate-and-improve-the-data-model-definition — Zalando generates the transformation data model from a Neo4j-hosted knowledge graph alongside the logical data model.
Related¶
- concepts/master-data-management — the enclosing discipline
- concepts/logical-data-model — the sibling deliverable (target schema)
- concepts/golden-record — what the transformation populates
- concepts/data-lineage — the side-effect capability
- concepts/knowledge-graph — the substrate the transformation model is generated from
- systems/zalando-mdm-system — canonical wiki instance
- patterns/mapping-driven-schema-generation — the pattern