Skip to content

CONCEPT Cited by 1 source

Advertiser-level loss

Definition

Advertiser-level loss is an auxiliary training objective that groups conversion (or other sparse-signal) examples at the advertiser granularityall Pins/products from the same advertiser — and trains the model to match conversion signals at that coarser granularity, in addition to the standard per-item loss. The intuition: per-item conversion labels are sparse and high-variance; per-advertiser conversion labels aggregate across an advertiser's inventory and are substantially lower-variance while still preserving directional signal about purchase intent.

Motivation

Per-Pin / per-item conversion supervision in ads ML suffers from:

  • Sparsity — many items have few or no observed conversions over the training window.
  • High variance — the conversion rate of an individual item can swing wildly with small sample sizes.
  • Sample-efficiency floor — there's no way to get more per-item gradient signal without more impressions, and impression budget is finite.

Grouping by a coarser entity that has structural correlation with per-item quality — the advertiser (equivalently: seller, brand) — produces labels that are:

  • Denser — an advertiser with 1000 items accumulates conversion signal across the portfolio.
  • Lower variance — aggregated per-advertiser conversion rates are smooth in a way per-item rates are not.
  • Directionally meaningful — advertisers with well-performing landing pages, well-priced products, and trusted brands have advertiser-level conversion signal that reflects item-level propensity.

The Pinterest shape

Pinterest introduced advertiser-level loss in the 2025 refresh of the shopping conversion candidate generation model (Source: sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation):

"We also observed that conversion data at the Pin level exhibit high variance, making it challenging to reliably model purchase intent from Pin-level supervision alone. To address this, we introduce an advertiser-level loss function as an additional training objective, enabling the model to better capture conversion signals at a more stable and consistent granularity."

Combined with other refresh levers (unified single-head multi-task architecture, parallel DCNv2+MLP cross layers), Pinterest reports +42% recall@100 for conversion tasks vs the 2023 model. Advertiser-level loss is one contributor to that delta; the post doesn't isolate its standalone effect.

How it composes with per-item loss

A typical implementation superimposes the advertiser-level objective on top of the standard per-item contrastive / sampled-softmax loss:

L_total = λ_item · L_per_item(x, y) + λ_adv · L_advertiser(x, advertiser(y))
  • L_per_item is the standard two-tower retrieval loss (sampled softmax, InfoNCE, etc.) over (user, item) pairs.
  • L_advertiser aggregates at the advertiser level — e.g. the user's affinity for the advertiser entity, or per-advertiser conversion-rate calibration.
  • λ_item, λ_adv are task weights, subject to the same balancing considerations as other auxiliary-task regularisations.

Pinterest doesn't describe the exact form of L_advertiser — whether it's a softmax over advertiser IDs, a regression on advertiser-level conversion rate, or something else. The wiki canonicalises the shape (group by advertiser → parallel loss at that granularity) without committing to a specific form.

Why advertiser granularity (and not some other)

Advertiser is a natural intermediate granularity for ads retrieval because:

  1. Item = advertiser-owned — the many-to-one relationship is clean and stable. Items belong to exactly one advertiser.
  2. Advertiser-level signals are economically meaningful — advertisers optimise campaigns, set budgets, choose creative. Per-advertiser quality is a real construct.
  3. Advertiser count is intermediate — larger than a few broad categories (where signal is too diluted), smaller than millions of items (where signal is too sparse).

Alternative groupings (category, price-band, country) carry different trade-offs: coarser groups lose more signal; finer groups approach per-item's sparsity.

  • Seller-level loss / brand-level loss — structurally identical in e-commerce contexts with different entity vocabulary.
  • Session-level loss — group by user session to reduce per-impression variance in click models.
  • User-level calibration — the opposite direction: group by user rather than item; used for personalised offsets.
  • Population stratified sampling — related but different idea: group to balance sampling, not to add a loss term.

Caveats

  • Exact loss form undisclosed. Pinterest names the principle, not the mathematical shape.
  • Task-weight balancing is implicit. "As an additional training objective" — the relative weight vs per-item loss is not described.
  • Asymmetry risk — if λ_adv is too high, the model underweights per-item differences within an advertiser; too low, and the advertiser regularisation has no effect. This is the classical MTL balancing problem.
  • Serving unchanged — the loss is a training-time addition; inference still scores per-Pin. The advertiser signal improves the shared-representation quality, not the serving-time API.
  • Fraud / bot / low-quality advertiser risk — advertisers with low-quality landing pages have legitimately low conversion rates; the model should learn to downrank them. Advertiser-level loss cuts both ways: it can help identify consistently-poor advertisers, but if not careful can also entrench incumbent advertisers.
  • Standalone effect not isolated. Pinterest reports a +42% recall@100 delta for the combined 2025 refresh (unified MTL + parallel DCN + advertiser loss); the post doesn't break down how much is attributable to advertiser loss alone.

Seen in

Last updated · 445 distilled / 1,275 read