CONCEPT Cited by 1 source

Healthcare data interoperability¶

The healthcare data interoperability problem is the structural challenge of building a unified, governed, queryable view of patient data when the source data lives in dozens of systems with mismatched formats, mismatched code systems, and mismatched identifiers. First wiki canonicalisation 2026-05-27 as a sysdesign forcing function rather than a domain-specific HL7 concern.

Definition¶

Per the 2026-05-27 source verbatim: "Healthcare data lives in dozens of systems, EHRs, claims, labs, pharmacy, SDoH, each with its own formats, codes, and duplicates." The fragmentation has three orthogonal axes:

Format axis — the same patient information is encoded in HL7v2 messages (legacy clinical), C-CDA documents (clinical summaries), X12 transactions (claims/billing), and proprietary formats. Without canonicalisation, every downstream consumer needs N parsers for N inbound formats.
Code-system axis — the same clinical concept (a diagnosis, a procedure, a lab observation, a medication) is encoded in multiple code systems (LOINC, SNOMED CT, RxNorm, ICD-10). Without a terminology server, the same concept gets counted multiple times under different codes.
Identity axis — the same patient exists as multiple records across multiple systems with mismatched identifiers and slightly different demographics. Without an MPI, one patient is counted as several.

Verbatim: "A single diagnosis may appear under multiple codes across multiple vocabularies. A single patient may exist as several records across several systems."

Why this is a sysdesign forcing function (not just an HL7 concern)¶

The healthcare-interoperability problem is structurally a fragmentation-and-canonicalisation sysdesign problem with concrete analogues outside healthcare:

Healthcare axis	Generic sysdesign analogue
Format axis	Multi-protocol ingestion at the boundary (Kafka topics, Avro/JSON schemas)
Code-system axis	Cross-vocabulary entity attribute resolution / canonicalisation
Identity axis	Entity resolution / golden-record patterns

The 2026-05-27 source's load-bearing claim is that all three canonicalisations should happen at point of entry — convert legacy formats into FHIR, normalise codes through a terminology server, deduplicate patients into golden records — so the substrate exposes a single canonical view to all downstream consumers.

The conventional remedy and its failure modes¶

Per the 2026-05-27 source, the conventional remedy is the three-component pattern: a FHIR server for interoperability, a separate data warehouse for analytics, and ETL pipelines connecting the two. This pattern's three structural failure modes:

Duplication tax. "The same clinical data is replicated across the FHIR server, the warehouse, and multiple staging layers — each adding storage, compute, and operational overhead."
FHIR-server-as-bottleneck. "Most implementations were designed for transactional use cases — document exchange, point lookups, regulatory APIs — not for the access patterns of modern analytics, ML pipelines, or AI agents that need to scan millions of resources efficiently."
Governance split. Each system maintains its own access controls, audit trails, and compliance posture. "Each system maintains its own access controls, audit trails, and compliance posture." Compliance becomes per-system rather than property-of-architecture.

These failure modes together force the over-provision-or-extract dilemma: "organizations are forced into trade-offs: over-provision FHIR infrastructure to maintain performance, or extract data into yet another system to make it usable."

The substrate-level resolution¶

The FHIR-server-on-lakehouse-substrate pattern resolves the conventional failure modes by collapsing the three components into one substrate with two access surfaces. The 2026-05-27 source's specific instantiation:

Standardisation at boundary — Health Samurai's HL7v2 / C-CDA / X12 converters + terminology server + MDM/MPI + IG validation.
Substrate — Aidbox on Lakebase.
Operational↔analytical bridge — Moonlink (real-time, zero-ETL).
Governance — Unity Catalog across both halves.
Standards-based access — FHIR API + SMART on FHIR + SQL on FHIR ViewDefinitions.
Native access — Spark / SQL / ML / AI/BI from the same dataset.

Compliance as architectural property¶

A second-order claim from the 2026-05-27 source: when the substrate is FHIR-native and governance is unified, regulatory compliance ceases to be a separate workstream. Verbatim: "By building on FHIR, organizations address mandates like CMS-0057 (Interoperability and Patient Access) and ONC requirements as a natural property of their architecture." Compliance contracts the post calls out:

CMS-0057 Patient Access Rule.
Payer-to-Payer Data Exchange.
ONC Health IT Certification.

The architectural argument: each compliance contract is satisfiable as a query / API endpoint over the canonical substrate, rather than as a separate compliance pipeline reading from a separate compliance store.

Seen in¶

2026-05-27 — sources/2026-05-27-databricks-building-a-fhir-native-health-data-platform-on-databricks-lakebase — first wiki canonicalisation as a sysdesign-altitude concept; positions the three-axis fragmentation (format / code-system / identity) and the three-failure-mode foil (duplication / bottleneck / governance split) for the FHIR-server-on-lakehouse-substrate pattern.

Caveats¶

Single-source canonicalisation. Tier-3 vendor co-marketing post; the architectural framing is sound but no production-scale evidence of the substrate-level resolution working better than the conventional three-component pattern.
The compliance-as-property claim is aspirational. No specific compliance-evidence pipeline disclosed.