CONCEPT Cited by 1 source
Late Fusion (Multimodal)¶
Late fusion is the multimodal-ML strategy of training a separate model per modality and combining their predictions (by averaging, weighted voting, a meta-learner, or similar) at the very end of the pipeline. Fusion happens in prediction space rather than input or representation space.
Why it's the production-default for sparse-modality settings¶
"Production rollouts where missing modalities are common … degrades gracefully when one or more modalities are absent." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai.)
When a modality is missing at inference time, late fusion's recovery is structural: skip the per-modality model whose input is absent and combine the remaining predictions. No padding, no zero-input-leakage, no retraining. This is why Databricks' multimodal post explicitly recommends it as the "safe start" for a 30-day pilot.
It is the canonical graceful- degradation instantiation at the multimodal-model layer.
Tradeoffs¶
- Cross-modal interactions learned late, if at all. If the combiner is a simple average, modality interactions are never modelled; if it's a meta-learner, it sees only per-modality probabilities, not per-modality representations.
- Independent per-modality labels required (unless distilled from the multimodal label), doubling or tripling the labelling burden.
- Multiple model-serving artefacts to version, monitor, and drift-detect per modality.
Contrast with other fusion strategies¶
- concepts/early-fusion — maximal interaction modelling, zero graceful degradation; opposite end of the fusion-point axis.
- concepts/intermediate-fusion — representation-level merge; models cross-modal interactions but breaks on missing modalities unless explicitly engineered for it.
- concepts/attention-based-fusion — dynamic per-example weighting; can be combined with late fusion to learn which modalities to trust per input.
See patterns/fusion-strategy-selection-by-deployment-reality for the decision-framework framing — late fusion's trigger is "modalities routinely missing in production."
Seen in¶
- sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai — Databricks names late fusion as the fusion strategy that "degrades gracefully when one or more modalities are absent", pairs it explicitly with the deployment-reality trigger "missing modalities are common", and recommends it as the "safe start" for an initial multimodal deployment. The post's broader architectural claim — "architectures designed for sparsity generalize" — is load-bearing on late fusion being the default option.