PATTERN Cited by 1 source

Auxiliary engagement task for conversion retrieval¶

Pattern¶

Train a conversion-optimised retrieval model with engagement prediction as a jointly-trained auxiliary task, sharing the encoders (towers). The engagement task contributes dense, low-variance gradient signal to the shared trunk; the conversion task contributes the primary purchase-intent supervision. Weight the task losses asymmetrically so the abundant engagement signal stabilises the shared representation without diluting the sparse conversion signal.

A concrete instance of auxiliary-task regularisation applied specifically to the conversion-retrieval regime.

Problem¶

A conversion-only retriever is impossible to train well:

Conversion labels are sparse, noisy, offsite, advertiser-reported, delayed.
Per-batch positive density is too low for a large-capacity model.
Shared trunk can't learn general-purpose representations from the thin signal alone.

A conversion-only model that inherits a generic engagement retriever's towers (transfer learning) doesn't solve the representation-quality issue either — the towers keep optimising for engagement unless retrained.

Solution¶

Attach an engagement task head (or in a unified architecture, mix engagement supervision into the single head) and train jointly:

 user, candidate → shared encoders → [ optional: task heads ]
                                          │
                          ┌───────────────┼───────────────┐
                          ▼               ▼               ▼
                    conversion loss   engagement loss    (others)
                          │               │
                          └─── weighted combination ───┘
                                     │
                          gradient → shared encoders
                                     (both tasks contribute)

Three design decisions make this pattern work:

Engagement task stabilises shared parameters. The dense engagement gradient regularises the shared trunk; the sparse conversion gradient specialises without being swamped.
Task weights balance abundance against objective mismatch. Done wrong, abundant engagement gradients dominate direction. Pinterest: "The crucial challenge is balancing the two tasks, ensuring the high-value conversion signal is not diluted by the more frequent engagement data."
Serving-time decision (multi-head only): use only the conversion head's output at inference. In the unified-head variant (see patterns/unified-multi-task-over-multi-head), the single embedding set inherits both tasks' signal directly.

Canonical instance — Pinterest shopping conversion CG¶

Pinterest (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation):

Primary task: conversion prediction (offsite events, sparse).
Auxiliary task: engagement prediction (clicks + repins, abundant).
Loss combination: weighted sampled-softmax per task (2023 multi-head) → weighted multi-task loss on unified head (2025 refresh).
Framing: "Our multi-task approach uses engagement prediction as an auxiliary task to stabilize training and boost performance."

The pattern is layered with:

Dual positive signal — engagement events also enter the conversion task's positive set (with click-duration reweighting for noise control).
Unified multi-task over multi-head — the 2025 refresh merged task-specific heads into a single unified head so the served embeddings directly benefit from multi-task supervision.
Advertiser-level loss — a third parallel training objective for variance reduction.

Comparison to dual-positive-signal¶

The two patterns are related but structurally distinct:

	Dual positive signal	Auxiliary engagement task
How auxiliary enters training	Mixed into primary task's positive set with per-example weights	Separate task head / loss term
Does auxiliary have its own loss?	No — shares primary's loss	Yes — independent loss, weighted
Typical pairing	Quality-proxy reweighting of auxiliary positives (dwell)	Balanced per-task scalar weights
Serving behaviour	Single model, unchanged	Multi-head at serving (use primary head) or unified (both)

Pinterest uses both patterns simultaneously in the shopping conversion CG — they're complementary ways of bringing engagement signal into the training.

When to apply¶

Primary task is sparse / noisy / delayed; abundant auxiliary signal is available.
Auxiliary task is semantically correlated with the primary.
Tuning loss weights is a tractable operational surface.
Serving constraint tolerates multi-task training architecture (retrieval is the natural fit because retrieval runs on lightweight towers; ranking is also feasible).

When NOT to apply¶

Auxiliary task is anti-correlated with the primary (engagement-positive items are systematically bad conversion-positives).
Engagement data is not available or untrustworthy.
Loss-weight tuning is intractable.
Task interference demonstrably hurts primary metric in A/B.

Caveats¶

Task-balancing is operationally fragile. Loss weights drift as data distribution shifts; need periodic retuning.
Auxiliary can hijack the primary silently if weighting is wrong — the abundant signal shifts the shared representation toward itself.
Task interference not handled explicitly in Pinterest's framing. Classical MTL mitigations (MMoE, PLE, gradient projection) aren't described for conversion CG.
Engagement-trained towers drift toward engagement — over training, the shared trunk embeds items by engagement patterns; must check conversion retrieval stays aligned with advertiser intent.
Operational cost of multi-task training: the engagement head's forward/backward adds compute to training, even if only the conversion head is served (multi-head case). In the unified case, the compute is single-path but the loss has more terms.

Seen in¶

2026-04-27 Pinterest — From Clicks to Conversions (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation) — canonical: engagement prediction as auxiliary task paired with conversion prediction as primary task in the shopping conversion candidate generation model, across both 2023 (multi-head) and 2025 (unified multi-task) generations.

concepts/auxiliary-task-regularization — concept framing.
concepts/multi-task-learning
concepts/offsite-conversion-sparsity
concepts/shopping-conversion-candidate-generation
patterns/dual-positive-signal-for-sparse-labels — sibling pattern.
patterns/unified-multi-task-over-multi-head — architectural evolution of the engagement-auxiliary setup.
systems/pinterest-shopping-conversion-cg