CONCEPT Cited by 1 source
Healthcare data interoperability¶
The healthcare data interoperability problem is the structural challenge of building a unified, governed, queryable view of patient data when the source data lives in dozens of systems with mismatched formats, mismatched code systems, and mismatched identifiers. First wiki canonicalisation 2026-05-27 as a sysdesign forcing function rather than a domain-specific HL7 concern.
Definition¶
Per the 2026-05-27 source verbatim: "Healthcare data lives in dozens of systems, EHRs, claims, labs, pharmacy, SDoH, each with its own formats, codes, and duplicates." The fragmentation has three orthogonal axes:
- Format axis — the same patient information is encoded in HL7v2 messages (legacy clinical), C-CDA documents (clinical summaries), X12 transactions (claims/billing), and proprietary formats. Without canonicalisation, every downstream consumer needs N parsers for N inbound formats.
- Code-system axis — the same clinical concept (a diagnosis, a procedure, a lab observation, a medication) is encoded in multiple code systems (LOINC, SNOMED CT, RxNorm, ICD-10). Without a terminology server, the same concept gets counted multiple times under different codes.
- Identity axis — the same patient exists as multiple records across multiple systems with mismatched identifiers and slightly different demographics. Without an MPI, one patient is counted as several.
Verbatim: "A single diagnosis may appear under multiple codes across multiple vocabularies. A single patient may exist as several records across several systems."
Why this is a sysdesign forcing function (not just an HL7 concern)¶
The healthcare-interoperability problem is structurally a fragmentation-and-canonicalisation sysdesign problem with concrete analogues outside healthcare:
| Healthcare axis | Generic sysdesign analogue |
|---|---|
| Format axis | Multi-protocol ingestion at the boundary (Kafka topics, Avro/JSON schemas) |
| Code-system axis | Cross-vocabulary entity attribute resolution / canonicalisation |
| Identity axis | Entity resolution / golden-record patterns |
The 2026-05-27 source's load-bearing claim is that all three canonicalisations should happen at point of entry — convert legacy formats into FHIR, normalise codes through a terminology server, deduplicate patients into golden records — so the substrate exposes a single canonical view to all downstream consumers.
The conventional remedy and its failure modes¶
Per the 2026-05-27 source, the conventional remedy is the three-component pattern: a FHIR server for interoperability, a separate data warehouse for analytics, and ETL pipelines connecting the two. This pattern's three structural failure modes:
-
Duplication tax. "The same clinical data is replicated across the FHIR server, the warehouse, and multiple staging layers — each adding storage, compute, and operational overhead."
-
FHIR-server-as-bottleneck. "Most implementations were designed for transactional use cases — document exchange, point lookups, regulatory APIs — not for the access patterns of modern analytics, ML pipelines, or AI agents that need to scan millions of resources efficiently."
-
Governance split. Each system maintains its own access controls, audit trails, and compliance posture. "Each system maintains its own access controls, audit trails, and compliance posture." Compliance becomes per-system rather than property-of-architecture.
These failure modes together force the over-provision-or-extract dilemma: "organizations are forced into trade-offs: over-provision FHIR infrastructure to maintain performance, or extract data into yet another system to make it usable."
The substrate-level resolution¶
The FHIR-server-on-lakehouse-substrate pattern resolves the conventional failure modes by collapsing the three components into one substrate with two access surfaces. The 2026-05-27 source's specific instantiation:
- Standardisation at boundary — Health Samurai's HL7v2 / C-CDA / X12 converters + terminology server + MDM/MPI + IG validation.
- Substrate — Aidbox on Lakebase.
- Operational↔analytical bridge — Moonlink (real-time, zero-ETL).
- Governance — Unity Catalog across both halves.
- Standards-based access — FHIR API + SMART on FHIR + SQL on FHIR ViewDefinitions.
- Native access — Spark / SQL / ML / AI/BI from the same dataset.
Compliance as architectural property¶
A second-order claim from the 2026-05-27 source: when the substrate is FHIR-native and governance is unified, regulatory compliance ceases to be a separate workstream. Verbatim: "By building on FHIR, organizations address mandates like CMS-0057 (Interoperability and Patient Access) and ONC requirements as a natural property of their architecture." Compliance contracts the post calls out:
- CMS-0057 Patient Access Rule.
- Payer-to-Payer Data Exchange.
- ONC Health IT Certification.
The architectural argument: each compliance contract is satisfiable as a query / API endpoint over the canonical substrate, rather than as a separate compliance pipeline reading from a separate compliance store.
Seen in¶
- 2026-05-27 — sources/2026-05-27-databricks-building-a-fhir-native-health-data-platform-on-databricks-lakebase — first wiki canonicalisation as a sysdesign-altitude concept; positions the three-axis fragmentation (format / code-system / identity) and the three-failure-mode foil (duplication / bottleneck / governance split) for the FHIR-server-on-lakehouse-substrate pattern.
Caveats¶
- Single-source canonicalisation. Tier-3 vendor co-marketing post; the architectural framing is sound but no production-scale evidence of the substrate-level resolution working better than the conventional three-component pattern.
- The compliance-as-property claim is aspirational. No specific compliance-evidence pipeline disclosed.