CONCEPT Cited by 1 source
Missing Modality Problem¶
The missing-modality problem is the production failure mode where a multimodal model — trained on examples where all modalities are present — encounters inputs in deployment where one or more modalities are absent. Real clinical, financial, and consumer-facing deployments rarely have every modality for every user.
The framing that earns it a page¶
"Not all patients receive comprehensive genomic profiling. Imaging studies may be unavailable. Wearables exist only for enrolled populations. Missingness isn't an edge case — it's the default." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai)
The load-bearing sentence is "architectures that assume complete data tend to fail in production. Architectures designed for sparsity generalize." The problem is reframed from a dataset quality concern (fill in the missing values) to an architectural concern (design for the sparse case from day one).
Three production-design responses¶
- Modality masking during training — remove modality inputs during development to simulate deployment reality. The regularisation shape forces the model to avoid over-reliance on any single modality.
- Sparse attention / modality-aware models — at inference time, the model knows which modalities are present and routes attention accordingly; extended from attention-based fusion.
- Transfer learning from richer cohorts — pre-train on the data-rich subpopulation, fine-tune on the data-sparse one with validation cohorts and bias checks.
Why fusion-strategy choice is missing-modality-aware¶
The multimodal-fusion strategies degrade differently under missingness:
- concepts/early-fusion — catastrophic: zero-padding a concatenated input teaches the model that zeros are informative; a missing input is indistinguishable from a present-zero one.
- concepts/intermediate-fusion — degrades poorly without special handling (dropout per modality encoder, learned placeholder embeddings).
- concepts/late-fusion — degrades gracefully: drop the per-modality model whose input is absent, combine the remaining predictions. This is why Databricks recommends late fusion as the "safe start" default for missing-modality-prone deployments.
- concepts/attention-based-fusion — can learn to down- weight absent modalities if training includes modality masking, but requires the discipline of response (1) to do so reliably.
See patterns/fusion-strategy-selection-by-deployment-reality for the decision framework that turns this into an architectural choice at system-design time.
Related failure framings in the wiki¶
- concepts/graceful-degradation — the general concept that late fusion instantiates at the multimodal-model layer.
- concepts/sparse-vector — representation-level sparsity (different shape, related vocabulary).
Seen in¶
- sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai — Databricks canonicalises missingness as "the default" in clinical deployments (genomic profiling, imaging studies, wearables all unevenly available), and enumerates the three production-design responses: modality masking, sparse / modality-aware attention, transfer learning. The post's broader architectural claim — "architectures that assume complete data tend to fail in production" — names the missing-modality problem as a first-class sysdesign concern rather than a dataset-cleaning concern.