PATTERN Cited by 1 source

Dual positive signal for sparse labels¶

Pattern¶

When the primary training signal is sparse, noisy, or delayed, supplement the positive set with a second, denser engagement signal — but discount the denser positives' contribution to prevent them from overwhelming the primary gradient. The denser positives are not treated as equivalents of the primary label; they carry lower per-example weight, often further adjusted by a quality proxy (dwell time, session depth, session recency).

Contrast with auxiliary-task regularisation: that pattern trains a separate task head on the auxiliary signal. Dual positive signal mixes the auxiliary positives directly into the primary task's positive set — same loss function, different per-example weights.

Problem¶

Training purely on sparse labels (conversions, chargebacks, long-horizon outcomes) means:

Per-batch positive density is too low to fit a large-capacity retrieval / ranking model.
Many candidate items have zero positive examples in the training window.
Shared representation fails to learn general-purpose features because gradient signal is too thin.

Training purely on the dense engagement signal:

Optimises for the wrong target. A model that ranks on clicks can be confidently wrong about conversions.

The goal: use both simultaneously, with the dense signal broadening coverage without hijacking the primary objective.

Solution¶

Form the positive set as:

positives = primary_positives ∪ (engagement_positives, weight = w)

where w is a per-example weight that reflects the quality of each engagement positive.

Canonical Pinterest shape for shopping conversion CG (Source: sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation):

Primary positives: offsite conversions (checkout, add-to-cart).
Engagement positives: clicks, repins.
Weight on engagement positives (for clicks): w = f(log(1 + t / t_max)) where t is click dwell time and t_max is a tunable cap — a click-duration reweighting.

Pinterest's framing:

"We supplement primary conversion signals with onsite engagement data (clicks, repins). This broadens data coverage, improving model generalization and ad funnel survival rates. To mitigate click data noise and decrease false positive clicks, we apply a log-based re-weighting function w based on the click duration [...] where t is the non-negative click duration in seconds and t_max is a tunable constant used to cap the re-weighting function."

Comparison to sibling patterns¶

Pattern	Where the auxiliary signal enters
Auxiliary engagement task	Separate task head; auxiliary has its own loss
Dual positive signal (this page)	Same positive set; auxiliary has its own per-example weight
Knowledge distillation	Soft-label regression of teacher outputs, not engagement-event merging
Minimum-dwell filter	Keep only high-quality clicks as positives; drop others — simpler, information-losing variant

Pinterest's shopping conversion CG uses both the dual-positive-signal pattern (mixing clicks/repins with conversions as positives) and the auxiliary-engagement-task pattern (engagement as a separate task head). The two are complementary.

When to apply¶

Primary signal is sparse, noisy, or delayed.
A dense semantically-adjacent signal exists on the same platform that correlates with the primary.
A quality proxy is observable to reweight the dense signal's per-example weight.
Infrastructure can handle per-example weighted losses.

When NOT to apply¶

Dense signal is anti-correlated with the primary (engagement-positive items are systematically bad conversion-positives) — mixing makes things worse.
No quality proxy is observable — blanket weighting of the dense signal either over- or under-weights it.
Primary labels are abundant enough to train alone.

Design decisions when applying¶

How to weight the auxiliary positives. Constant scalar is cheapest; quality-proxy-dependent weights (dwell time, session depth) add signal but require the proxy to be observable and trustworthy.
Whether to cap the weight. Pinterest caps the reweighting at t_max — prevents extreme dwells from dominating.
Whether to combine with auxiliary-task head. Pinterest does both; some production systems only do one.
Whether to filter instead of reweight. Alternative: hard-drop clicks below a dwell threshold. Simpler, information-losing; Pinterest chose reweighting over filtering.

Caveats¶

Weight tuning is ongoing work. t_max and the overall dense-signal weight shift with data distribution; need periodic re-tuning.
Quality proxy can be gameable. Adversarial advertisers can inflate dwell time (intentional friction, hanging loads). Dwell-time-based weighting is only as robust as the underlying measurement.
Privacy and consent. Per-click dwell measurement requires instrumentation; some regulatory regimes restrict this.
Interaction with negative sampling. If the auxiliary positives are mixed into the positive pool, they also need to be mixed into the identity-mask / same-anchor-mask for in-batch negative handling (see concepts/in-batch-negative-false-negative).
Pinterest doesn't disclose: the relative weight of conversions vs reweighted clicks in the loss, the exact functional form f, the value of t_max, or the conversion-vs-engagement example ratio in training batches.

Seen in¶

2026-04-27 Pinterest — From Clicks to Conversions (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation) — canonical: conversions + reweighted clicks + repins as dual positive signal in the shopping conversion candidate generation training set.

concepts/offsite-conversion-sparsity — the primary-signal sparsity that motivates this pattern.
concepts/click-duration-reweighting — the quality-proxy reweighting applied to the dense signal.
concepts/shopping-conversion-candidate-generation
concepts/auxiliary-task-regularization
patterns/auxiliary-engagement-task-for-conversion-retrieval — sibling MTL approach.
systems/pinterest-shopping-conversion-cg