PATTERN Cited by 1 source
Auxiliary engagement task for conversion retrieval¶
Pattern¶
Train a conversion-optimised retrieval model with engagement prediction as a jointly-trained auxiliary task, sharing the encoders (towers). The engagement task contributes dense, low-variance gradient signal to the shared trunk; the conversion task contributes the primary purchase-intent supervision. Weight the task losses asymmetrically so the abundant engagement signal stabilises the shared representation without diluting the sparse conversion signal.
A concrete instance of auxiliary-task regularisation applied specifically to the conversion-retrieval regime.
Problem¶
A conversion-only retriever is impossible to train well:
- Conversion labels are sparse, noisy, offsite, advertiser-reported, delayed.
- Per-batch positive density is too low for a large-capacity model.
- Shared trunk can't learn general-purpose representations from the thin signal alone.
A conversion-only model that inherits a generic engagement retriever's towers (transfer learning) doesn't solve the representation-quality issue either — the towers keep optimising for engagement unless retrained.
Solution¶
Attach an engagement task head (or in a unified architecture, mix engagement supervision into the single head) and train jointly:
user, candidate → shared encoders → [ optional: task heads ]
│
┌───────────────┼───────────────┐
▼ ▼ ▼
conversion loss engagement loss (others)
│ │
└─── weighted combination ───┘
│
gradient → shared encoders
(both tasks contribute)
Three design decisions make this pattern work:
- Engagement task stabilises shared parameters. The dense engagement gradient regularises the shared trunk; the sparse conversion gradient specialises without being swamped.
- Task weights balance abundance against objective mismatch. Done wrong, abundant engagement gradients dominate direction. Pinterest: "The crucial challenge is balancing the two tasks, ensuring the high-value conversion signal is not diluted by the more frequent engagement data."
- Serving-time decision (multi-head only): use only the conversion head's output at inference. In the unified-head variant (see patterns/unified-multi-task-over-multi-head), the single embedding set inherits both tasks' signal directly.
Canonical instance — Pinterest shopping conversion CG¶
Pinterest (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation):
- Primary task: conversion prediction (offsite events, sparse).
- Auxiliary task: engagement prediction (clicks + repins, abundant).
- Loss combination: weighted sampled-softmax per task (2023 multi-head) → weighted multi-task loss on unified head (2025 refresh).
- Framing: "Our multi-task approach uses engagement prediction as an auxiliary task to stabilize training and boost performance."
The pattern is layered with:
- Dual positive signal — engagement events also enter the conversion task's positive set (with click-duration reweighting for noise control).
- Unified multi-task over multi-head — the 2025 refresh merged task-specific heads into a single unified head so the served embeddings directly benefit from multi-task supervision.
- Advertiser-level loss — a third parallel training objective for variance reduction.
Comparison to dual-positive-signal¶
The two patterns are related but structurally distinct:
| Dual positive signal | Auxiliary engagement task | |
|---|---|---|
| How auxiliary enters training | Mixed into primary task's positive set with per-example weights | Separate task head / loss term |
| Does auxiliary have its own loss? | No — shares primary's loss | Yes — independent loss, weighted |
| Typical pairing | Quality-proxy reweighting of auxiliary positives (dwell) | Balanced per-task scalar weights |
| Serving behaviour | Single model, unchanged | Multi-head at serving (use primary head) or unified (both) |
Pinterest uses both patterns simultaneously in the shopping conversion CG — they're complementary ways of bringing engagement signal into the training.
When to apply¶
- Primary task is sparse / noisy / delayed; abundant auxiliary signal is available.
- Auxiliary task is semantically correlated with the primary.
- Tuning loss weights is a tractable operational surface.
- Serving constraint tolerates multi-task training architecture (retrieval is the natural fit because retrieval runs on lightweight towers; ranking is also feasible).
When NOT to apply¶
- Auxiliary task is anti-correlated with the primary (engagement-positive items are systematically bad conversion-positives).
- Engagement data is not available or untrustworthy.
- Loss-weight tuning is intractable.
- Task interference demonstrably hurts primary metric in A/B.
Caveats¶
- Task-balancing is operationally fragile. Loss weights drift as data distribution shifts; need periodic retuning.
- Auxiliary can hijack the primary silently if weighting is wrong — the abundant signal shifts the shared representation toward itself.
- Task interference not handled explicitly in Pinterest's framing. Classical MTL mitigations (MMoE, PLE, gradient projection) aren't described for conversion CG.
- Engagement-trained towers drift toward engagement — over training, the shared trunk embeds items by engagement patterns; must check conversion retrieval stays aligned with advertiser intent.
- Operational cost of multi-task training: the engagement head's forward/backward adds compute to training, even if only the conversion head is served (multi-head case). In the unified case, the compute is single-path but the loss has more terms.
Seen in¶
- 2026-04-27 Pinterest — From Clicks to Conversions (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation) — canonical: engagement prediction as auxiliary task paired with conversion prediction as primary task in the shopping conversion candidate generation model, across both 2023 (multi-head) and 2025 (unified multi-task) generations.
Related¶
- concepts/auxiliary-task-regularization — concept framing.
- concepts/multi-task-learning
- concepts/offsite-conversion-sparsity
- concepts/shopping-conversion-candidate-generation
- patterns/dual-positive-signal-for-sparse-labels — sibling pattern.
- patterns/unified-multi-task-over-multi-head — architectural evolution of the engagement-auxiliary setup.
- systems/pinterest-shopping-conversion-cg