Skip to content

CONCEPT Cited by 1 source

Intermediate Fusion (Multimodal)

Intermediate fusion is the multimodal-ML strategy of encoding each modality separately through a modality-specific encoder, then merging the resulting hidden representations before the task head. The fusion point is in representation space — after per-modality encoding but before the final prediction.

When it survives production

"Combining high-dimensional omics with lower-dimensional EHR / clinical features." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai.)

Modality-specific encoders normalise dimensionality before the joint layer: a 10⁶-dim genomics input and a 10² EHR vector can both be projected to, say, 512-dim before concatenation, so the downstream model doesn't see the raw 10000:1 dimensionality gap early fusion exposes.

Tradeoffs called out in ingested sources

"Requires careful representation learning per modality and disciplined evaluation." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai.) The cost is engineering: each modality now has its own pre-training / fine-tuning / frozen-vs-trainable decision, and the fusion layer's inductive bias (concatenation vs gated merge vs cross-attention) is an extra degree of freedom that needs evaluation.

Contrast with other fusion strategies

See patterns/fusion-strategy-selection-by-deployment-reality for the decision-framework framing.

Seen in

Last updated · 517 distilled / 1,221 read