Skip to content

CONCEPT Cited by 1 source

Ad impression as hard negative

Definition

Ad impressions with no user engagement serve as hard negatives in ads-retrieval contrastive training — served-but-not-engaged items that reflect the real distribution of ads the production retriever actually showed, not just random items from the catalog.

Hard negatives, as distinct from in-batch / random negatives, are candidates that are semantically close to the anchor's positive — items the model might plausibly rank high but shouldn't. Training against them teaches the model to discriminate the decision boundary of the inventory it will see at serving time, not just to separate trivial positives from trivial random negatives.

Why served-but-not-engaged matters

Three properties make served-but-not-engaged ad impressions high-value negatives:

  1. They reflect the actual served distribution. Random-catalog negatives come from a uniform distribution over ad inventory. Served-but-not-engaged impressions come from whatever ads the current production retriever + ranker + auction chose to show — the served distribution the next model will inherit.

  2. They mark the decision boundary. These ads were ranked high enough by the existing funnel to be shown, yet the user didn't engage. They are the close-but-wrong candidates the new model most needs to learn to downrank.

  3. They're cheap. Every served ad impression is logged regardless of engagement outcome; the hard-negative pool is a natural byproduct of the serving stack. No active-learning / online-hard-negative-mining machinery required.

Pinterest's framing (Source: sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation):

"On top of the existing in-batch negatives, we use ad impressions with no engagement as 'harder negatives.' These samples can reflect the real distribution of served ads, exposing the model to a more representative inventory and promoting robust contrastive learning."

Relationship to other negative-sampling strategies

  • In-batch negatives. Cheap, abundant, but often too easy — random co-batch items are typically trivially different from the anchor's positive. Cover the bulk-separation regime. Pinterest stacks ad-impression hard negatives on top of in-batch negatives, not instead of.
  • Hard negatives from nearest-neighbour mining. Find items close to the anchor in embedding space and use as negatives. Requires iterating over the corpus; expensive; more precisely targeted than served-impressions but missing the production-distribution-reflection property.
  • Random-from-catalog negatives. Cheap but even easier than in-batch; not commonly used alone at scale.
  • In-batch negative false-negatives — sibling concept: in-batch negatives can be actual positives when batches are correlated (same user / same session / request-sorted). Served-but-not-engaged ad impressions are explicit non-positives for the specific (user, ad) pair, so they don't suffer the false-negative risk in the same way — the user saw the ad and didn't engage, not just "we don't have engagement data."

When to apply

  • Ads / sponsored-content retrieval where impression logs are abundant and the serving pipeline exists.
  • Conversion-objective retrievers specifically — since conversion signals are sparse, the hard-negatives pool is especially valuable for defining the decision boundary.
  • Two-tower contrastive training where the negative-sampling strategy is the central design axis.

When NOT to apply

  • Cold-start deployments where no serving pipeline exists yet to generate impressions.
  • Novel-item-heavy corpora where the served distribution systematically excludes new items — the hard-negatives pool will reinforce that exclusion.
  • Privacy-sensitive contexts where impression-level user-item logs can't be retained or used for training.

Caveats

  • Exposure bias. Using served ads as hard negatives bakes in the existing retriever's biases. New model learns "don't retrieve things the old model served but users ignored" — if the old model systematically under-served a class of items, the new model learns to under-serve them too. Pinterest doesn't address this directly; it's a known MLE-on-logs hazard.
  • Selection bias on "no engagement". Users skip ads for many reasons other than low relevance — ad blindness, distraction, slow load. Treating all no-engagement impressions equally flattens these distinctions.
  • Scale ratio undisclosed. Pinterest doesn't name the ratio of in-batch negatives vs ad-impression hard negatives per anchor, nor the sampling strategy within the hard-negatives pool.
  • Position effects. An ad at position 10 on Home Feed has different no-engagement likelihood than position 1; treating them equally introduces noise. Position-bias-aware training not mentioned.
  • Interaction with click-duration reweighting. The ad was served with no engagement — but what if the user hovered for 500 ms before scrolling past? Pinterest's click-duration reweighting applies to clicks, not to the pre-click dwell signal; the discretisation of "engagement" vs "no engagement" is binary here.

Seen in

Last updated · 445 distilled / 1,275 read