CONCEPT Cited by 1 source

Intermediate Fusion (Multimodal)¶

Intermediate fusion is the multimodal-ML strategy of encoding each modality separately through a modality-specific encoder, then merging the resulting hidden representations before the task head. The fusion point is in representation space — after per-modality encoding but before the final prediction.

When it survives production¶

"Combining high-dimensional omics with lower-dimensional EHR / clinical features." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai.)

Modality-specific encoders normalise dimensionality before the joint layer: a 10⁶-dim genomics input and a 10² EHR vector can both be projected to, say, 512-dim before concatenation, so the downstream model doesn't see the raw 10000:1 dimensionality gap early fusion exposes.

Tradeoffs called out in ingested sources¶

"Requires careful representation learning per modality and disciplined evaluation." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai.) The cost is engineering: each modality now has its own pre-training / fine-tuning / frozen-vs-trainable decision, and the fusion layer's inductive bias (concatenation vs gated merge vs cross-attention) is an extra degree of freedom that needs evaluation.

Contrast with other fusion strategies¶

concepts/early-fusion — concatenate raw inputs; simpler but breaks on dimensionality gaps + missing modalities.
concepts/late-fusion — combine per-modality predictions rather than representations; loses cross-modal interaction modelling in exchange for graceful degradation.
concepts/attention-based-fusion — learned dynamic weighting at the fusion layer; intermediate fusion with attention-based merge is a common instantiation.

See patterns/fusion-strategy-selection-by-deployment-reality for the decision-framework framing.

Seen in¶

sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai — Databricks positions intermediate fusion as the default pick when modalities have mismatched dimensionality (genomics ≫ EHR), reframing the strategy's strength as "preserving modality-specific signal" through per-modality encoders. The representation-learning-per-modality requirement is called out as the cost that buys the dimensionality balance.