CONCEPT Cited by 1 source
Multi-task learning¶
Definition¶
Multi-task learning (MTL) is the neural-network training paradigm where a single model with shared parameters is trained jointly on multiple related tasks, with task-specific heads layered on top of the shared representation. The hypothesis: tasks that share underlying structure benefit from shared representation learning, and auxiliary tasks regularise the main task.
Canonical MTL shapes:
- Shared trunk + task-specific heads — the default. One encoder, N heads, trained with a weighted combination of per-task losses.
- MMoE (Multi-gate Mixture of Experts) — shared experts routed by per-task gates; each task learns its own soft-routing over experts, letting the model specialise experts per task while still sharing training signal.
- PLE (Progressive Layered Extraction) — explicit task-shared + task-specific subspaces at each layer.
Canonical wiki instance — Pinterest ads engagement model¶
Pinterest introduces multi-task learning in its unified ads engagement model "to restore extensibility" — a single shared architecture across HF + SR "limited flexibility and made it harder to iterate on surface-specific features and modules" (Source: sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces):
"To restore extensibility, we introduced a multi-task learning design within the unified model and enabled surface-specific checkpoint exports. We exported separate surface checkpoints so each surface could adopt the most appropriate architecture while still benefiting from shared representation learning. This enabled more flexible, surface-specific CTR prediction and established a foundation for continued surface-specific iteration."
The shape: multi-task heads where the task axis is surfaces (HF + SR as separate tasks) — structurally similar to MMoE routing but with surface-specific checkpoint export as the deployment mechanism rather than gated expert selection.
Why MTL in unified ranking models¶
- Iteration velocity. Adding a new surface = new task head, not a new model. Shared trunk stays stable; task-specific specialisation is additive.
- Regularisation. Auxiliary tasks contribute gradient signal to the shared trunk — common tasks stabilise the representation for rare tasks.
- Consistent candidate representation. All tasks score the same candidate with the same features; no representation drift across tasks.
- Resolves unified-model rigidity. Without task-specific heads, a unified model collapses to one-size-fits-all scoring — exactly the failure mode the unification was supposed to escape. Per-task heads are the extensibility escape valve.
Relationship to MMoE¶
MMoE is a specific MTL architecture — shared experts routed by per-task gates. Pinterest's unified model uses MMoE in the trunk + multi-task heads on top + surface-specific tower trees + surface-specific calibration. MMoE handles expert specialisation; the task heads handle head-level specialisation; tower trees handle surface-specific downstream work.
Caveats¶
- Pinterest doesn't disclose the task-loss weighting scheme (static weights, adaptive weighting, gradient balancing), task count, head composition, or how MMoE interacts with the surface-specific heads at training time.
- Task interference risk. Joint training across many heads can hurt main-task accuracy if gradients conflict — the classical MTL failure mode. The Pinterest post doesn't discuss task-interference handling.
- Not all surfaces use the same task set — the post says each surface "could adopt the most appropriate architecture", implying potentially different head compositions per surface checkpoint.
- Loss-weighting is a tuning surface not specified in the post.
Seen in¶
- 2026-03-03 Pinterest — Unifying Ads Engagement Modeling (sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces) — canonical: multi-task heads restore per-surface extensibility on top of a unified model, paired with surface-specific checkpoint exports.
- 2026-03-18 Meta — Friend Bubbles on Facebook Reels (sources/2026-03-18-meta-friend-bubbles-enhancing-social-discovery-on-facebook-reels) via MTML ranking — multi-task heads for multiple engagement types (watch / like / bubble-conditioned engagement) within a single surface.
Related¶
- concepts/multi-task-multi-label-ranking — specific MTL variant for ranking with multi-label tasks.
- concepts/mixture-of-experts — MMoE as a specific MTL architecture.
- patterns/surface-specific-tower-tree — per-surface specialisation at the tower granularity.
- patterns/surface-specific-checkpoint-export — deployment mechanism for per-surface specialisation.
- systems/pinterest-ads-engagement-model
- companies/pinterest