CONCEPT Cited by 1 source
Master Patient Index (MPI)¶
A Master Patient Index (MPI) — sometimes called MDM (Master Data Management) for patient records — is the deduplication primitive that ensures one patient = one golden record across multiple source systems in a healthcare data platform. First wiki canonicalisation 2026-05-27.
Definition¶
In the conventional fragmented healthcare data landscape, a single patient often exists as multiple records across multiple source systems — distinct EHR records, claims records, lab records, pharmacy records — with mismatched identifiers, slightly different demographics, and no shared join key. An MPI is the system component that:
- Ingests patient records from multiple source systems.
- Compares records using deterministic and/or probabilistic linkage (name + DOB + address + identifiers) — see probabilistic record linkage.
- Resolves matches into a single golden record with canonical demographics + linked source-record identifiers.
- Surfaces the golden record as the primary patient identity in downstream operational and analytical workflows.
Verbatim from the 2026-05-27 source: "MDM/MPI (Master Data Management / Master Patient Index) deduplicates patient records so one patient equals one golden record."
Architectural role¶
In the FHIR-server-on-lakehouse-substrate pattern (Aidbox-on-Lakebase + Moonlink + Unity Catalog), MPI is one of four named capabilities in Health Samurai's standardisation layer (alongside the open-source HL7v2 / C-CDA / X12 converters, the FHIR-native Terminology Server, and FHIR Implementation Guide validation).
MPI's structural payoff is composable with terminology-server normalisation:
- Terminology server ensures "one diagnosis is counted once regardless of source system" (collapsing code-system noise on the attribute axis).
- MPI ensures "one patient equals one golden record" (collapsing identity noise on the entity axis).
Both are point-of-entry transformations — the substrate sees deduplicated, code-normalised FHIR resources rather than the raw ingest data with all its redundancy and code drift.
Relationship to entity resolution¶
MPI is the domain-specific instance of entity resolution applied to patient records. Many of the sysdesign concerns are shared:
- Deterministic vs probabilistic linkage trade-offs (see probabilistic record linkage).
- Curse-of-the-last-reducer stragglers in batch ER (see curse of the last reducer).
- Golden-record vs golden-graph trade-offs in storing the resolved entity.
- Human-in-the-loop review queues for low-confidence matches.
The MPI-specific overlay is the regulatory + clinical-correctness envelope: a false-positive merge (two patients into one golden record) is a clinical-safety incident; a false-negative split (one patient as two golden records) is a continuity-of-care incident. Both have higher cost than a typical e-commerce or marketing entity-resolution failure.
Seen in¶
- 2026-05-27 — sources/2026-05-27-databricks-building-a-fhir-native-health-data-platform-on-databricks-lakebase — first wiki canonicalisation. Health Samurai's MDM/MPI capability named at capability altitude as one of four standardisation-layer primitives feeding clean FHIR data into the operational substrate.
Caveats¶
- Capability altitude only. No specific Health Samurai MPI mechanism disclosed — no probabilistic-linkage discipline, no FP/FN rates, no scale numbers, no human-review workflow, no integration specifics with Aidbox.
- No multi-source-record canonical-identifier disclosure. How the golden record's canonical identifier relates to the source-system identifiers is not addressed.
- No merge-conflict resolution disclosure. When two source records disagree on demographics (e.g. address, last name post-marriage), how the golden record's canonical demographics are resolved is not addressed.