PATTERN Cited by 1 source
Fusion Strategy Selection by Deployment Reality¶
Shape¶
Pick a multimodal fusion strategy (concepts/early-fusion, concepts/intermediate-fusion, concepts/late-fusion, concepts/attention-based-fusion) based on three deployment-reality axes, not on benchmark strength:
- Modality availability patterns — how often is each modality actually present at inference time in production?
- Dimensionality balance — do modality input sizes span orders of magnitude (e.g. 10⁶-dim genomics vs 10²-dim EHR)?
- Temporal dynamics — do modalities arrive at different cadences (wearables continuous, notes episodic, imaging rare) and does interaction timing carry signal?
The decision framework reframes fusion from a model-quality question ("which is most powerful?") to a system-design question ("which survives my deployment?").
Decision matrix (as framed by the ingested source)¶
| Axis that dominates | Fusion strategy |
|---|---|
| All modalities always present, cohort small and controlled | concepts/early-fusion |
| High-dimensional + low-dimensional modalities coexist | concepts/intermediate-fusion |
| Modalities frequently missing in production | concepts/late-fusion |
| Temporal dynamics + complex cross-modal interaction | concepts/attention-based-fusion |
Why this is a pattern, not just a taxonomy¶
Most multimodal-ML tutorials treat the four fusion strategies as an architectural taxonomy and benchmark them head-to-head. The pattern here is the inversion: fusion is chosen to match deployment axes before training, and the evaluation question is "does this strategy degrade the way my deployment requires?" rather than "is it the strongest?".
"Match fusion to your deployment reality: modality availability patterns, dimensionality balance, and temporal dynamics." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai)
Default recommendation¶
Late fusion as the "safe start" default when missing modalities are expected:
"Choose a fusion baseline that tolerates missingness (late fusion is often a safe start)." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai)
The reasoning is structural: late fusion's degradation story is "drop the missing-modality model, combine the rest" — no padding, no zero-leakage, no retraining.
Forces¶
- Modality availability is a deployment property, not a data property. It can't be fixed by a better ETL pipeline.
- Dimensionality balance matters at training time, not only inference. Early fusion's failure mode shows up during training as per-modality scale imbalance before it shows up in production.
- Attention-based fusion is tempting but validation-heavy. "Harder to validate; requires careful controls to avoid spurious correlations" — don't jump to it without the infrastructure for evaluation cohorts + bias checks.
- Fusion strategy is coupled to training discipline. The pattern only works if combined with modality masking during training to actually exercise the missing-modality case.
Consequences¶
- The team picks a fusion strategy before choosing a model architecture; model design is downstream of deployment profiling.
- Evaluation cohorts are structured by modality-availability subgroup (patients with all modalities vs genomics-only vs imaging-only, etc.) rather than a single held-out split.
- The missing-modality problem is treated as a first-class architectural concern, not an engineering afterthought.
Seen in¶
- sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai — canonical instance. Databricks pairs each of the four fusion strategies with an explicit deployment-reality trigger (small cohort / dimensionality mismatch / missing modalities / temporal dynamics) and flips the selection question from "which is strongest?" to "which survives my deployment?". The 30-day pilot playbook inside the post recommends late fusion as the default safe start when modality availability is uncertain.