Skip to content

CONCEPT Cited by 1 source

Master data management

Definition

Master Data Management (MDM) is "a technology-enabled discipline in which business and Information Technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official shared master data assets" (sources/2021-07-28-zalando-knowledge-graph-technologies-accelerate-and-improve-the-data-model-definition).

In practice, MDM addresses the problem where an enterprise has no central view of a specific subject matter — business partners, customers, products — because the relevant records are scattered across many systems, each with its own (differing or same) copy. MDM introduces a [[concepts/golden- record|golden record]]: a single, shared, trusted view per domain.

Implementation styles

MDM has three canonical implementation styles in the industry. Zalando names the one they chose:

  • Consolidated style — ingest from source systems → run through match-and-merge → cleanse / quality-assure → store centrally in a canonical model → publish golden record back to source systems for correction. "At Zalando we are at an early phase of realising MDM for our internal data assets and we have chosen to do it in a consolidated style."
  • Registry style (unnamed in Zalando's post) — a central MDM system stores only identifiers and pointers; records remain in source systems.
  • Coexistence style (unnamed in Zalando's post) — golden record is stored centrally and written back into source systems as authoritative.

Core MDM deliverables

Any consolidated-style MDM project produces at least two schemas:

  1. Logical data model — the schema of the golden record: which entities exist, their attributes, their relationships.
  2. Transformation data model — for each source system, how each of its tables and columns maps (directly or indirectly) onto the logical model.

A transformation mapping that is direct = 1-to-1 column copy. Indirect = 1-to-many, requires a transformation algorithm (e.g. parsing unstructured address lines into structured components).

The manual-definition problem

Zalando names five drawbacks of the traditional MDM workflow where the logical data model is authored by hand (Source: sources/2021-07-28-zalando-knowledge-graph-technologies-accelerate-and-improve-the-data-model-definition):

  1. Linear-in-table-count manual work"the amount of manual work to create the logical data model increases relatively to the number of system tables."
  2. Domain knowledge is in the wrong place"the data models are read and created by colleagues from engineering with limited business know-how."
  3. Communication artifacts are unreadable for business — SQL schemas and spreadsheets gatekeep understanding for non-technical domain experts.
  4. Business-engineering handoff is lossy"the domain expert is limited from conveying correctly the knowledge to the engineers creating the data model, which leads to errors and misunderstandings."
  5. Risk amplification — the logical data model drives UI, processes, business rules, and storage; errors found late in development are expensive to unwind. "A MDM tool is released with a faulty and incorrect model that needs iterations of rework."

Zalando's response is to use a knowledge graph as the authoring substrate and auto-generate both schemas from it (patterns/knowledge-graph-for-mdm-modeling).

Seen in

Last updated · 476 distilled / 1,218 read