Skip to content

CONCEPT Cited by 2 sources

Multi-task multi-label (MTML) ranking

Definition

MTML ranking is the class of recommendation / search ranker architectures that share a candidate + context representation across a set of tasks (distinct prediction targets — watch, like, comment, share, follow, …) each with potentially multiple labels (binary / multi-class / graded), trained jointly with a combined loss.

Structurally an MTML ranker is typically:

      (user + candidate + context features)
           [ shared encoder / trunk ]
       ┌─────────────┼─────────────┐
       ▼             ▼             ▼
   head: watch   head: like    head: bubble-
    (logprob)   (binary)       conditioned
                               engagement
                                (P(y | bubble))

The shared trunk amortises feature-extraction cost across all tasks; task-specific heads specialise for each prediction. Variants — MMoE (Multi-gate Mixture of Experts), PLE (Progressive Layered Extraction), shared-bottom vs task-specific-bottom — differ in how shared representation flows to each head.

Why MTML instead of independent rankers

  • Cost. One forward pass over the shared encoder scores N tasks; independent rankers do N forward passes.
  • Shared regularisation. Auxiliary tasks regularise the main task; rare-task labels benefit from signals in common-task labels.
  • Consistent candidate scoring. All tasks score the same candidate with the same features — no representation drift across tasks.
  • Easy to add tasks. When a new signal class appears (e.g. bubble-conditioned engagement), it's a new head, not a new model.

Canonical wiki reference

Meta Friend Bubbles (sources/2026-03-18-meta-friend-bubbles-enhancing-social-discovery-on-facebook-reels) uses MTML at both early-stage and late-stage ranking in Facebook Reels:

"We integrated friend-bubble interaction signals as features and added new tasks into both early-stage and late-stage ranking multi-task, multi-label (MTML) models to incorporate viewer-friend relationship strength and to learn downstream engagement on videos with social bubbles."

Two architectural additions for Friend Bubbles:

  1. New features. Viewer-friend closeness scores + bubble-interaction signals become input features.
  2. New tasks. Bubble-conditioned engagement — P(video engagement | bubble impression) — becomes a new ranker head, and its output enters the augmented ranking formula via a tunable weight. See patterns/conditional-probability-ranking-objective.

A continuous feedback loop re-trains the MTML models on fresh bubble-interaction data, letting them keep learning "which friend-content combinations resonate with users."

Early-stage vs late-stage

The Meta post distinguishes "early-stage" and "late-stage" MTML models — a standard recommendation-ranking convention:

  • Early-stage ranking scores a larger candidate pool with a cheaper model, narrows to a smaller set.
  • Late-stage ranking scores the narrowed set with a more expensive MTML.

Both are MTML-shaped in Meta's Reels system. Both had the bubble features + tasks added. This matters because: a signal added only at late-stage cannot recover candidates the early-stage ranker already eliminated. Meta adds the signal at both stages so it propagates through the whole funnel.

Caveats

  • Topology is not described. The Meta post names "MTML" as a class — number of tasks, head composition, encoder depth, MMoE/PLE variant, gradient-balancing scheme are all undisclosed.
  • Task interference risk. Joint training across many heads can hurt main-task accuracy if gradients conflict — the known failure mode MMoE/PLE were designed to address. The Meta post does not discuss this.
  • Loss-weighting is a tuning surface. The w · P(engage | bubble) term sits alongside other tunable weights in the overall ranking formula; the tuning procedure (offline grid, Bayesian optimisation, adaptive weighting) is not specified.
  • Feature-engineering vs model-architecture tradeoff. Adding closeness as a feature is only part of the fix; Meta's insight is that friend-content is a different distribution that needs its own task to score correctly — pure feature addition would under-fit. This is the load-bearing architectural claim.

Seen in

  • sources/2026-03-18-meta-friend-bubbles-enhancing-social-discovery-on-facebook-reels — canonical; names MTML at both early + late stages with new tasks for bubble-conditioned engagement.
  • sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfacesPinterest ads engagement unification — MTML applied across product surfaces (rather than across task semantics). Pinterest's unified ads engagement model uses multi-task heads where the task axis is the surface itself (HF / SR / RP as separate tasks). Structurally the same MTML shape — shared trunk + task-specific heads — but with different task semantics: Meta uses MTML tasks for different engagement types within one surface (Reels: watch / like / bubble-conditioned engagement); Pinterest uses MTML tasks for the same engagement prediction (CTR) across different surfaces. Pinterest's variant adds surface-specific tower trees and surface-specific checkpoint exports as deployment-level specialisation mechanisms — structurally analogous to MMoE gating but at the tower/deployment granularity. Extends the MTML concept beyond single-surface multi-engagement to multi-surface single-engagement.
Last updated · 319 distilled / 1,201 read