Skip to content

PATTERN Cited by 1 source

Auxiliary engagement task for conversion retrieval

Pattern

Train a conversion-optimised retrieval model with engagement prediction as a jointly-trained auxiliary task, sharing the encoders (towers). The engagement task contributes dense, low-variance gradient signal to the shared trunk; the conversion task contributes the primary purchase-intent supervision. Weight the task losses asymmetrically so the abundant engagement signal stabilises the shared representation without diluting the sparse conversion signal.

A concrete instance of auxiliary-task regularisation applied specifically to the conversion-retrieval regime.

Problem

A conversion-only retriever is impossible to train well:

A conversion-only model that inherits a generic engagement retriever's towers (transfer learning) doesn't solve the representation-quality issue either — the towers keep optimising for engagement unless retrained.

Solution

Attach an engagement task head (or in a unified architecture, mix engagement supervision into the single head) and train jointly:

 user, candidate → shared encoders → [ optional: task heads ]
                          ┌───────────────┼───────────────┐
                          ▼               ▼               ▼
                    conversion loss   engagement loss    (others)
                          │               │
                          └─── weighted combination ───┘
                          gradient → shared encoders
                                     (both tasks contribute)

Three design decisions make this pattern work:

  1. Engagement task stabilises shared parameters. The dense engagement gradient regularises the shared trunk; the sparse conversion gradient specialises without being swamped.
  2. Task weights balance abundance against objective mismatch. Done wrong, abundant engagement gradients dominate direction. Pinterest: "The crucial challenge is balancing the two tasks, ensuring the high-value conversion signal is not diluted by the more frequent engagement data."
  3. Serving-time decision (multi-head only): use only the conversion head's output at inference. In the unified-head variant (see patterns/unified-multi-task-over-multi-head), the single embedding set inherits both tasks' signal directly.

Canonical instance — Pinterest shopping conversion CG

Pinterest (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation):

  • Primary task: conversion prediction (offsite events, sparse).
  • Auxiliary task: engagement prediction (clicks + repins, abundant).
  • Loss combination: weighted sampled-softmax per task (2023 multi-head) → weighted multi-task loss on unified head (2025 refresh).
  • Framing: "Our multi-task approach uses engagement prediction as an auxiliary task to stabilize training and boost performance."

The pattern is layered with:

Comparison to dual-positive-signal

The two patterns are related but structurally distinct:

Dual positive signal Auxiliary engagement task
How auxiliary enters training Mixed into primary task's positive set with per-example weights Separate task head / loss term
Does auxiliary have its own loss? No — shares primary's loss Yes — independent loss, weighted
Typical pairing Quality-proxy reweighting of auxiliary positives (dwell) Balanced per-task scalar weights
Serving behaviour Single model, unchanged Multi-head at serving (use primary head) or unified (both)

Pinterest uses both patterns simultaneously in the shopping conversion CG — they're complementary ways of bringing engagement signal into the training.

When to apply

  • Primary task is sparse / noisy / delayed; abundant auxiliary signal is available.
  • Auxiliary task is semantically correlated with the primary.
  • Tuning loss weights is a tractable operational surface.
  • Serving constraint tolerates multi-task training architecture (retrieval is the natural fit because retrieval runs on lightweight towers; ranking is also feasible).

When NOT to apply

  • Auxiliary task is anti-correlated with the primary (engagement-positive items are systematically bad conversion-positives).
  • Engagement data is not available or untrustworthy.
  • Loss-weight tuning is intractable.
  • Task interference demonstrably hurts primary metric in A/B.

Caveats

  • Task-balancing is operationally fragile. Loss weights drift as data distribution shifts; need periodic retuning.
  • Auxiliary can hijack the primary silently if weighting is wrong — the abundant signal shifts the shared representation toward itself.
  • Task interference not handled explicitly in Pinterest's framing. Classical MTL mitigations (MMoE, PLE, gradient projection) aren't described for conversion CG.
  • Engagement-trained towers drift toward engagement — over training, the shared trunk embeds items by engagement patterns; must check conversion retrieval stays aligned with advertiser intent.
  • Operational cost of multi-task training: the engagement head's forward/backward adds compute to training, even if only the conversion head is served (multi-head case). In the unified case, the compute is single-path but the loss has more terms.

Seen in

Last updated · 445 distilled / 1,275 read