Skip to content

CONCEPT Cited by 1 source

Multi-task learning

Definition

Multi-task learning (MTL) is the neural-network training paradigm where a single model with shared parameters is trained jointly on multiple related tasks, with task-specific heads layered on top of the shared representation. The hypothesis: tasks that share underlying structure benefit from shared representation learning, and auxiliary tasks regularise the main task.

Canonical MTL shapes:

  • Shared trunk + task-specific heads — the default. One encoder, N heads, trained with a weighted combination of per-task losses.
  • MMoE (Multi-gate Mixture of Experts) — shared experts routed by per-task gates; each task learns its own soft-routing over experts, letting the model specialise experts per task while still sharing training signal.
  • PLE (Progressive Layered Extraction) — explicit task-shared + task-specific subspaces at each layer.

Canonical wiki instance — Pinterest ads engagement model

Pinterest introduces multi-task learning in its unified ads engagement model "to restore extensibility" — a single shared architecture across HF + SR "limited flexibility and made it harder to iterate on surface-specific features and modules" (Source: sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces):

"To restore extensibility, we introduced a multi-task learning design within the unified model and enabled surface-specific checkpoint exports. We exported separate surface checkpoints so each surface could adopt the most appropriate architecture while still benefiting from shared representation learning. This enabled more flexible, surface-specific CTR prediction and established a foundation for continued surface-specific iteration."

The shape: multi-task heads where the task axis is surfaces (HF + SR as separate tasks) — structurally similar to MMoE routing but with surface-specific checkpoint export as the deployment mechanism rather than gated expert selection.

Why MTL in unified ranking models

  • Iteration velocity. Adding a new surface = new task head, not a new model. Shared trunk stays stable; task-specific specialisation is additive.
  • Regularisation. Auxiliary tasks contribute gradient signal to the shared trunk — common tasks stabilise the representation for rare tasks.
  • Consistent candidate representation. All tasks score the same candidate with the same features; no representation drift across tasks.
  • Resolves unified-model rigidity. Without task-specific heads, a unified model collapses to one-size-fits-all scoring — exactly the failure mode the unification was supposed to escape. Per-task heads are the extensibility escape valve.

Relationship to MMoE

MMoE is a specific MTL architecture — shared experts routed by per-task gates. Pinterest's unified model uses MMoE in the trunk + multi-task heads on top + surface-specific tower trees + surface-specific calibration. MMoE handles expert specialisation; the task heads handle head-level specialisation; tower trees handle surface-specific downstream work.

Caveats

  • Pinterest doesn't disclose the task-loss weighting scheme (static weights, adaptive weighting, gradient balancing), task count, head composition, or how MMoE interacts with the surface-specific heads at training time.
  • Task interference risk. Joint training across many heads can hurt main-task accuracy if gradients conflict — the classical MTL failure mode. The Pinterest post doesn't discuss task-interference handling.
  • Not all surfaces use the same task set — the post says each surface "could adopt the most appropriate architecture", implying potentially different head compositions per surface checkpoint.
  • Loss-weighting is a tuning surface not specified in the post.

Seen in

Last updated · 319 distilled / 1,201 read