Skip to content

CONCEPT

Transformation data model

Definition

A transformation data model specifies, for each source system, how each of its columns maps (directly or indirectly) to the logical data model of the golden record ().

It is the second required deliverable of any consolidated-style MDM project (the first being the logical data model itself). Without it, the consolidation pipeline has no per-system recipe for turning source records into golden-record contributions.

Direct vs. indirect mappings

Zalando defines two mapping types:

  • Direct (1-to-1) — one source column maps to exactly one logical-model column, with no transformation. Example: System B's zip_codeAddress.postal code.
  • Indirect (1-to-many or M-to-1) — one or many source columns require a transformation algorithm to populate one or many logical-model columns. Example: System A's address_line_1, _2, _3 → structured street, city, postal code on Address. "These columns need to be processed into the MDM system with a transformation algorithm."

Worked example (from the post)

System A (legacy, free-text address lines):

  • address id → concept Address, relationship has contact (target)
  • business partner id → concept Business Partner, relationship has contact (source)
  • address_line_1 → concept Address (indirect)
  • address_line_2 → concept Address (indirect)
  • address_line_3 → concept Address (indirect)

System B (structured address fields):

  • id → concept Address, relationship has contact (target)
  • business partner id → concept Business Partner, relationship has contact (source)
  • street → concept Address, attribute street name (direct)
  • zip_code → concept Address, attribute postal code (direct)
  • city → concept Address, attribute city name (direct)
  • country_code → concept Address, attribute country code (direct)

The transformation data model emitted for System B would be a list of 1:1 column copies (except the IDs, which become relationship FKs). For System A, it would emit a call into a parsing function over the three address_line columns.

Relation to data lineage

Because the transformation data model names every source- column → concept contribution, it doubles as a data-lineage record: any golden- record field can be queried back to every contributing source column across every source system. Zalando names this as a capability that "falls out" of the graph-based approach (Source: ).

Seen in

  • — Zalando generates the transformation data model from a Neo4j-hosted knowledge graph alongside the logical data model.
Last updated · 542 distilled / 1,571 read