CONCEPT Cited by 3 sources

Multi-task learning¶

Definition¶

Multi-task learning (MTL) is the neural-network training paradigm where a single model with shared parameters is trained jointly on multiple related tasks, with task-specific heads layered on top of the shared representation. The hypothesis: tasks that share underlying structure benefit from shared representation learning, and auxiliary tasks regularise the main task.

Canonical MTL shapes:

Shared trunk + task-specific heads — the default. One encoder, N heads, trained with a weighted combination of per-task losses.
MMoE (Multi-gate Mixture of Experts) — shared experts routed by per-task gates; each task learns its own soft-routing over experts, letting the model specialise experts per task while still sharing training signal.
PLE (Progressive Layered Extraction) — explicit task-shared + task-specific subspaces at each layer.

Canonical wiki instance — Pinterest ads engagement model¶

Pinterest introduces multi-task learning in its unified ads engagement model "to restore extensibility" — a single shared architecture across HF + SR "limited flexibility and made it harder to iterate on surface-specific features and modules" (Source: sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces):

"To restore extensibility, we introduced a multi-task learning design within the unified model and enabled surface-specific checkpoint exports. We exported separate surface checkpoints so each surface could adopt the most appropriate architecture while still benefiting from shared representation learning. This enabled more flexible, surface-specific CTR prediction and established a foundation for continued surface-specific iteration."

The shape: multi-task heads where the task axis is surfaces (HF + SR as separate tasks) — structurally similar to MMoE routing but with surface-specific checkpoint export as the deployment mechanism rather than gated expert selection.

Why MTL in unified ranking models¶

Iteration velocity. Adding a new surface = new task head, not a new model. Shared trunk stays stable; task-specific specialisation is additive.
Regularisation. Auxiliary tasks contribute gradient signal to the shared trunk — common tasks stabilise the representation for rare tasks.
Consistent candidate representation. All tasks score the same candidate with the same features; no representation drift across tasks.
Resolves unified-model rigidity. Without task-specific heads, a unified model collapses to one-size-fits-all scoring — exactly the failure mode the unification was supposed to escape. Per-task heads are the extensibility escape valve.

Relationship to MMoE¶

MMoE is a specific MTL architecture — shared experts routed by per-task gates. Pinterest's unified model uses MMoE in the trunk + multi-task heads on top + surface-specific tower trees + surface-specific calibration. MMoE handles expert specialisation; the task heads handle head-level specialisation; tower trees handle surface-specific downstream work.

Caveats¶

Pinterest doesn't disclose the task-loss weighting scheme (static weights, adaptive weighting, gradient balancing), task count, head composition, or how MMoE interacts with the surface-specific heads at training time.
Task interference risk. Joint training across many heads can hurt main-task accuracy if gradients conflict — the classical MTL failure mode. The Pinterest post doesn't discuss task-interference handling.
Not all surfaces use the same task set — the post says each surface "could adopt the most appropriate architecture", implying potentially different head compositions per surface checkpoint.
Loss-weighting is a tuning surface not specified in the post.

Seen in¶

2026-03-03 Pinterest — Unifying Ads Engagement Modeling (sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces) — canonical: multi-task heads restore per-surface extensibility on top of a unified model, paired with surface-specific checkpoint exports.
2026-03-18 Meta — Friend Bubbles on Facebook Reels (sources/2026-03-18-meta-friend-bubbles-enhancing-social-discovery-on-facebook-reels) via MTML ranking — multi-task heads for multiple engagement types (watch / like / bubble-conditioned engagement) within a single surface.
2026-04-27 Pinterest — From Clicks to Conversions (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation) — conversion-primary + engagement-auxiliary MTL setup in the shopping conversion candidate generation retrieval model. Documents both the multi-head variant (2023) — separate engagement + conversion heads with weighted combination; only conversion head served — and its unified single-head replacement (2025) per patterns/unified-multi-task-over-multi-head, added advertiser-level loss, +42% recall@100 combined delta. Pinterest names task-weight balancing as "the crucial challenge" — the load-bearing operational difficulty of asymmetric-density MTL.
2026-05-26 Meta — SilverTorch (sources/2026-05-26-meta-silvertorch-index-as-model-a-new-retrieval-paradigm-for-recommendation-systems) — MTL applied at the retrieval-stage altitude rather than the L2-ranker altitude. The multi-task retrieval scoring face: a "composite score" combining predictions for like / share / comment runs inside the retrieval forward pass under Index as Model, so retrieval is "no longer optimizing around one coarse similarity signal" but evaluating candidates against a richer multi-objective scoring. Same multi-task principle as MTML rankers, but made affordable at retrieval-pool scope by GPU-substrate consolidation. The patterns/unified-pytorch-model-as-retrieval-system pattern is the architectural envelope.

concepts/multi-task-multi-label-ranking — specific MTL variant for ranking with multi-label tasks.
concepts/mixture-of-experts — MMoE as a specific MTL architecture.
concepts/auxiliary-task-regularization — asymmetric-density MTL variant canonicalised in the conversion-CG setup.
concepts/offsite-conversion-sparsity — primary-task sparsity that motivates auxiliary-task MTL in conversion retrieval.
concepts/advertiser-level-loss — coarser-granularity MTL variant for variance reduction.
patterns/surface-specific-tower-tree — per-surface specialisation at the tower granularity.
patterns/surface-specific-checkpoint-export — deployment mechanism for per-surface specialisation.
patterns/unified-multi-task-over-multi-head — architectural simplification for tasks whose embeddings must converge at serving.
patterns/auxiliary-engagement-task-for-conversion-retrieval — canonical auxiliary-task instance in ads retrieval.
systems/pinterest-ads-engagement-model
systems/pinterest-shopping-conversion-cg
companies/pinterest