CONCEPT Cited by 1 source
Advertiser-level loss¶
Definition¶
Advertiser-level loss is an auxiliary training objective that groups conversion (or other sparse-signal) examples at the advertiser granularity — all Pins/products from the same advertiser — and trains the model to match conversion signals at that coarser granularity, in addition to the standard per-item loss. The intuition: per-item conversion labels are sparse and high-variance; per-advertiser conversion labels aggregate across an advertiser's inventory and are substantially lower-variance while still preserving directional signal about purchase intent.
Motivation¶
Per-Pin / per-item conversion supervision in ads ML suffers from:
- Sparsity — many items have few or no observed conversions over the training window.
- High variance — the conversion rate of an individual item can swing wildly with small sample sizes.
- Sample-efficiency floor — there's no way to get more per-item gradient signal without more impressions, and impression budget is finite.
Grouping by a coarser entity that has structural correlation with per-item quality — the advertiser (equivalently: seller, brand) — produces labels that are:
- Denser — an advertiser with 1000 items accumulates conversion signal across the portfolio.
- Lower variance — aggregated per-advertiser conversion rates are smooth in a way per-item rates are not.
- Directionally meaningful — advertisers with well-performing landing pages, well-priced products, and trusted brands have advertiser-level conversion signal that reflects item-level propensity.
The Pinterest shape¶
Pinterest introduced advertiser-level loss in the 2025 refresh of the shopping conversion candidate generation model (Source: sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation):
"We also observed that conversion data at the Pin level exhibit high variance, making it challenging to reliably model purchase intent from Pin-level supervision alone. To address this, we introduce an advertiser-level loss function as an additional training objective, enabling the model to better capture conversion signals at a more stable and consistent granularity."
Combined with other refresh levers (unified single-head multi-task architecture, parallel DCNv2+MLP cross layers), Pinterest reports +42% recall@100 for conversion tasks vs the 2023 model. Advertiser-level loss is one contributor to that delta; the post doesn't isolate its standalone effect.
How it composes with per-item loss¶
A typical implementation superimposes the advertiser-level objective on top of the standard per-item contrastive / sampled-softmax loss:
L_per_itemis the standard two-tower retrieval loss (sampled softmax, InfoNCE, etc.) over (user, item) pairs.L_advertiseraggregates at the advertiser level — e.g. the user's affinity for the advertiser entity, or per-advertiser conversion-rate calibration.λ_item,λ_advare task weights, subject to the same balancing considerations as other auxiliary-task regularisations.
Pinterest doesn't describe the exact form of L_advertiser — whether it's a softmax over advertiser IDs, a regression on advertiser-level conversion rate, or something else. The wiki canonicalises the shape (group by advertiser → parallel loss at that granularity) without committing to a specific form.
Why advertiser granularity (and not some other)¶
Advertiser is a natural intermediate granularity for ads retrieval because:
- Item = advertiser-owned — the many-to-one relationship is clean and stable. Items belong to exactly one advertiser.
- Advertiser-level signals are economically meaningful — advertisers optimise campaigns, set budgets, choose creative. Per-advertiser quality is a real construct.
- Advertiser count is intermediate — larger than a few broad categories (where signal is too diluted), smaller than millions of items (where signal is too sparse).
Alternative groupings (category, price-band, country) carry different trade-offs: coarser groups lose more signal; finer groups approach per-item's sparsity.
Related granularity-shift patterns¶
- Seller-level loss / brand-level loss — structurally identical in e-commerce contexts with different entity vocabulary.
- Session-level loss — group by user session to reduce per-impression variance in click models.
- User-level calibration — the opposite direction: group by user rather than item; used for personalised offsets.
- Population stratified sampling — related but different idea: group to balance sampling, not to add a loss term.
Caveats¶
- Exact loss form undisclosed. Pinterest names the principle, not the mathematical shape.
- Task-weight balancing is implicit. "As an additional training objective" — the relative weight vs per-item loss is not described.
- Asymmetry risk — if
λ_advis too high, the model underweights per-item differences within an advertiser; too low, and the advertiser regularisation has no effect. This is the classical MTL balancing problem. - Serving unchanged — the loss is a training-time addition; inference still scores per-Pin. The advertiser signal improves the shared-representation quality, not the serving-time API.
- Fraud / bot / low-quality advertiser risk — advertisers with low-quality landing pages have legitimately low conversion rates; the model should learn to downrank them. Advertiser-level loss cuts both ways: it can help identify consistently-poor advertisers, but if not careful can also entrench incumbent advertisers.
- Standalone effect not isolated. Pinterest reports a +42% recall@100 delta for the combined 2025 refresh (unified MTL + parallel DCN + advertiser loss); the post doesn't break down how much is attributable to advertiser loss alone.
Seen in¶
- 2026-04-27 Pinterest — From Clicks to Conversions (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation) — canonical: advertiser-level loss introduced as part of the 2025 unified-multi-task refresh, named as addressing "high variance" of per-Pin conversion supervision.
Related¶
- concepts/offsite-conversion-sparsity — the primary-task sparsity that motivates coarser-granularity regularisation.
- concepts/auxiliary-task-regularization — parent framework for adding non-primary objectives.
- concepts/multi-task-learning — the training paradigm.
- concepts/shopping-conversion-candidate-generation
- systems/pinterest-shopping-conversion-cg