Skip to content

CONCEPT Cited by 1 source

Master Patient Index (MPI)

A Master Patient Index (MPI) — sometimes called MDM (Master Data Management) for patient records — is the deduplication primitive that ensures one patient = one golden record across multiple source systems in a healthcare data platform. First wiki canonicalisation 2026-05-27.

Definition

In the conventional fragmented healthcare data landscape, a single patient often exists as multiple records across multiple source systems — distinct EHR records, claims records, lab records, pharmacy records — with mismatched identifiers, slightly different demographics, and no shared join key. An MPI is the system component that:

  1. Ingests patient records from multiple source systems.
  2. Compares records using deterministic and/or probabilistic linkage (name + DOB + address + identifiers) — see probabilistic record linkage.
  3. Resolves matches into a single golden record with canonical demographics + linked source-record identifiers.
  4. Surfaces the golden record as the primary patient identity in downstream operational and analytical workflows.

Verbatim from the 2026-05-27 source: "MDM/MPI (Master Data Management / Master Patient Index) deduplicates patient records so one patient equals one golden record."

Architectural role

In the FHIR-server-on-lakehouse-substrate pattern (Aidbox-on-Lakebase + Moonlink + Unity Catalog), MPI is one of four named capabilities in Health Samurai's standardisation layer (alongside the open-source HL7v2 / C-CDA / X12 converters, the FHIR-native Terminology Server, and FHIR Implementation Guide validation).

MPI's structural payoff is composable with terminology-server normalisation:

  • Terminology server ensures "one diagnosis is counted once regardless of source system" (collapsing code-system noise on the attribute axis).
  • MPI ensures "one patient equals one golden record" (collapsing identity noise on the entity axis).

Both are point-of-entry transformations — the substrate sees deduplicated, code-normalised FHIR resources rather than the raw ingest data with all its redundancy and code drift.

Relationship to entity resolution

MPI is the domain-specific instance of entity resolution applied to patient records. Many of the sysdesign concerns are shared:

  • Deterministic vs probabilistic linkage trade-offs (see probabilistic record linkage).
  • Curse-of-the-last-reducer stragglers in batch ER (see curse of the last reducer).
  • Golden-record vs golden-graph trade-offs in storing the resolved entity.
  • Human-in-the-loop review queues for low-confidence matches.

The MPI-specific overlay is the regulatory + clinical-correctness envelope: a false-positive merge (two patients into one golden record) is a clinical-safety incident; a false-negative split (one patient as two golden records) is a continuity-of-care incident. Both have higher cost than a typical e-commerce or marketing entity-resolution failure.

Seen in

Caveats

  • Capability altitude only. No specific Health Samurai MPI mechanism disclosed — no probabilistic-linkage discipline, no FP/FN rates, no scale numbers, no human-review workflow, no integration specifics with Aidbox.
  • No multi-source-record canonical-identifier disclosure. How the golden record's canonical identifier relates to the source-system identifiers is not addressed.
  • No merge-conflict resolution disclosure. When two source records disagree on demographics (e.g. address, last name post-marriage), how the golden record's canonical demographics are resolved is not addressed.
Last updated · 542 distilled / 1,571 read