Skip to content

CONCEPT Cited by 1 source

Transformation data model

Definition

A transformation data model specifies, for each source system, how each of its columns maps (directly or indirectly) to the logical data model of the golden record (sources/2021-07-28-zalando-knowledge-graph-technologies-accelerate-and-improve-the-data-model-definition).

It is the second required deliverable of any consolidated-style MDM project (the first being the logical data model itself). Without it, the consolidation pipeline has no per-system recipe for turning source records into golden-record contributions.

Direct vs. indirect mappings

Zalando defines two mapping types:

  • Direct (1-to-1) — one source column maps to exactly one logical-model column, with no transformation. Example: System B's zip_codeAddress.postal code.
  • Indirect (1-to-many or M-to-1) — one or many source columns require a transformation algorithm to populate one or many logical-model columns. Example: System A's address_line_1, _2, _3 → structured street, city, postal code on Address. "These columns need to be processed into the MDM system with a transformation algorithm."

Worked example (from the post)

System A (legacy, free-text address lines):

  • address id → concept Address, relationship has contact (target)
  • business partner id → concept Business Partner, relationship has contact (source)
  • address_line_1 → concept Address (indirect)
  • address_line_2 → concept Address (indirect)
  • address_line_3 → concept Address (indirect)

System B (structured address fields):

  • id → concept Address, relationship has contact (target)
  • business partner id → concept Business Partner, relationship has contact (source)
  • street → concept Address, attribute street name (direct)
  • zip_code → concept Address, attribute postal code (direct)
  • city → concept Address, attribute city name (direct)
  • country_code → concept Address, attribute country code (direct)

The transformation data model emitted for System B would be a list of 1:1 column copies (except the IDs, which become relationship FKs). For System A, it would emit a call into a parsing function over the three address_line columns.

Relation to data lineage

Because the transformation data model names every source- column → concept contribution, it doubles as a data-lineage record: any golden- record field can be queried back to every contributing source column across every source system. Zalando names this as a capability that "falls out" of the graph-based approach (Source: sources/2021-07-28-zalando-knowledge-graph-technologies-accelerate-and-improve-the-data-model-definition).

Seen in

Last updated · 476 distilled / 1,218 read