CONCEPT Cited by 2 sources
Multi-task multi-label (MTML) ranking¶
Definition¶
MTML ranking is the class of recommendation / search ranker architectures that share a candidate + context representation across a set of tasks (distinct prediction targets — watch, like, comment, share, follow, …) each with potentially multiple labels (binary / multi-class / graded), trained jointly with a combined loss.
Structurally an MTML ranker is typically:
(user + candidate + context features)
│
▼
[ shared encoder / trunk ]
│
┌─────────────┼─────────────┐
▼ ▼ ▼
head: watch head: like head: bubble-
(logprob) (binary) conditioned
engagement
(P(y | bubble))
The shared trunk amortises feature-extraction cost across all tasks; task-specific heads specialise for each prediction. Variants — MMoE (Multi-gate Mixture of Experts), PLE (Progressive Layered Extraction), shared-bottom vs task-specific-bottom — differ in how shared representation flows to each head.
Why MTML instead of independent rankers¶
- Cost. One forward pass over the shared encoder scores N tasks; independent rankers do N forward passes.
- Shared regularisation. Auxiliary tasks regularise the main task; rare-task labels benefit from signals in common-task labels.
- Consistent candidate scoring. All tasks score the same candidate with the same features — no representation drift across tasks.
- Easy to add tasks. When a new signal class appears (e.g. bubble-conditioned engagement), it's a new head, not a new model.
Canonical wiki reference¶
Meta Friend Bubbles (sources/2026-03-18-meta-friend-bubbles-enhancing-social-discovery-on-facebook-reels) uses MTML at both early-stage and late-stage ranking in Facebook Reels:
"We integrated friend-bubble interaction signals as features and added new tasks into both early-stage and late-stage ranking multi-task, multi-label (MTML) models to incorporate viewer-friend relationship strength and to learn downstream engagement on videos with social bubbles."
Two architectural additions for Friend Bubbles:
- New features. Viewer-friend closeness scores + bubble-interaction signals become input features.
- New tasks. Bubble-conditioned engagement —
P(video engagement | bubble impression)— becomes a new ranker head, and its output enters the augmented ranking formula via a tunable weight. See patterns/conditional-probability-ranking-objective.
A continuous feedback loop re-trains the MTML models on fresh bubble-interaction data, letting them keep learning "which friend-content combinations resonate with users."
Early-stage vs late-stage¶
The Meta post distinguishes "early-stage" and "late-stage" MTML models — a standard recommendation-ranking convention:
- Early-stage ranking scores a larger candidate pool with a cheaper model, narrows to a smaller set.
- Late-stage ranking scores the narrowed set with a more expensive MTML.
Both are MTML-shaped in Meta's Reels system. Both had the bubble features + tasks added. This matters because: a signal added only at late-stage cannot recover candidates the early-stage ranker already eliminated. Meta adds the signal at both stages so it propagates through the whole funnel.
Caveats¶
- Topology is not described. The Meta post names "MTML" as a class — number of tasks, head composition, encoder depth, MMoE/PLE variant, gradient-balancing scheme are all undisclosed.
- Task interference risk. Joint training across many heads can hurt main-task accuracy if gradients conflict — the known failure mode MMoE/PLE were designed to address. The Meta post does not discuss this.
- Loss-weighting is a tuning surface. The
w · P(engage | bubble)term sits alongside other tunable weights in the overall ranking formula; the tuning procedure (offline grid, Bayesian optimisation, adaptive weighting) is not specified. - Feature-engineering vs model-architecture tradeoff. Adding closeness as a feature is only part of the fix; Meta's insight is that friend-content is a different distribution that needs its own task to score correctly — pure feature addition would under-fit. This is the load-bearing architectural claim.
Seen in¶
- sources/2026-03-18-meta-friend-bubbles-enhancing-social-discovery-on-facebook-reels — canonical; names MTML at both early + late stages with new tasks for bubble-conditioned engagement.
- sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces — Pinterest ads engagement unification — MTML applied across product surfaces (rather than across task semantics). Pinterest's unified ads engagement model uses multi-task heads where the task axis is the surface itself (HF / SR / RP as separate tasks). Structurally the same MTML shape — shared trunk + task-specific heads — but with different task semantics: Meta uses MTML tasks for different engagement types within one surface (Reels: watch / like / bubble-conditioned engagement); Pinterest uses MTML tasks for the same engagement prediction (CTR) across different surfaces. Pinterest's variant adds surface-specific tower trees and surface-specific checkpoint exports as deployment-level specialisation mechanisms — structurally analogous to MMoE gating but at the tower/deployment granularity. Extends the MTML concept beyond single-surface multi-engagement to multi-surface single-engagement.
Related¶
- concepts/retrieval-ranking-funnel — MTML is the stage-2 ranker choice in recommendation instances.
- concepts/viewer-friend-closeness — the upstream feature.
- patterns/conditional-probability-ranking-objective — the mechanism for adding the new bubble task to the formula.
- patterns/closed-feedback-loop-ai-features — the training-loop pattern that keeps the MTML models fresh.
- systems/meta-friend-bubbles — canonical instance.
- concepts/llm-based-ranker — the alternative stage-2 ranker family for non-recommendation domains.