CONCEPT Cited by 1 source
Intermediate Fusion (Multimodal)¶
Intermediate fusion is the multimodal-ML strategy of encoding each modality separately through a modality-specific encoder, then merging the resulting hidden representations before the task head. The fusion point is in representation space — after per-modality encoding but before the final prediction.
When it survives production¶
"Combining high-dimensional omics with lower-dimensional EHR / clinical features." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai.)
Modality-specific encoders normalise dimensionality before the joint layer: a 10⁶-dim genomics input and a 10² EHR vector can both be projected to, say, 512-dim before concatenation, so the downstream model doesn't see the raw 10000:1 dimensionality gap early fusion exposes.
Tradeoffs called out in ingested sources¶
"Requires careful representation learning per modality and disciplined evaluation." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai.) The cost is engineering: each modality now has its own pre-training / fine-tuning / frozen-vs-trainable decision, and the fusion layer's inductive bias (concatenation vs gated merge vs cross-attention) is an extra degree of freedom that needs evaluation.
Contrast with other fusion strategies¶
- concepts/early-fusion — concatenate raw inputs; simpler but breaks on dimensionality gaps + missing modalities.
- concepts/late-fusion — combine per-modality predictions rather than representations; loses cross-modal interaction modelling in exchange for graceful degradation.
- concepts/attention-based-fusion — learned dynamic weighting at the fusion layer; intermediate fusion with attention-based merge is a common instantiation.
See patterns/fusion-strategy-selection-by-deployment-reality for the decision-framework framing.
Seen in¶
- sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai — Databricks positions intermediate fusion as the default pick when modalities have mismatched dimensionality (genomics ≫ EHR), reframing the strategy's strength as "preserving modality-specific signal" through per-modality encoders. The representation-learning-per-modality requirement is called out as the cost that buys the dimensionality balance.