CONCEPT Cited by 1 source

In-batch negative false-negative¶

Definition¶

In contrastive two-tower retrieval training, the in-batch negative sampling trick uses other candidates in the same training batch as negative examples for each anchor user-positive pair — saves the cost of sampling random negatives from a huge catalog.

A false negative is an "in-batch negative" item that is actually a positive for the anchor user (the user did engage with it, it's just listed as a different row's positive). Training the model to push the anchor user's embedding away from that item actively degrades retrieval quality because the model learns to avoid items the user actually engaged with.

With IID-sampled batches, the false-negative rate is near zero: users engage with a tiny fraction of the total item corpus, so the probability that a random in-batch item is also a positive for the anchor is negligible.

With request-sorted batches (see concepts/iid-disruption-from-request-sorted-data), batches concentrate around fewer users and "each user may have dozens or hundreds of engagements grouped together." The false-negative rate jumps from ~0% to ~30% on Pinterest workloads, depending on the number of unique users per batch (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication).

The mechanism¶

Standard InfoNCE loss with logit correction (Yi et al., 2019) uses the similarity function s(x, y) (dot product between user embedding x and item embedding y):

L_i = -log[ exp(s(x_i, y_i) - log p_y_i)
           / sum_k exp(s(x_i, y_k) - log p_y_k) ]

where (x_i, y_i) is the anchor user-positive pair and {y_k} are candidates in batch B.

Under IID:

batch = [ (user_A, item_1 engaged),
          (user_B, item_2 engaged),
          (user_C, item_3 engaged),
          (user_D, item_4 engaged) ]

For anchor (user_A, item_1):
  positive = item_1
  negatives = { item_2, item_3, item_4 }  ← all items user_A did NOT engage with
  false-negative rate: ~0%

Under request-sorted:

batch = [ (user_A, item_1),
          (user_A, item_2),   ← user_A DID engage with item_2
          (user_A, item_3),   ← user_A DID engage with item_3
          (user_B, item_4) ]

For anchor (user_A, item_1):
  positive = item_1
  negatives = { item_2, item_3, item_4 }
              └────┬────┘
              FALSE NEGATIVES: user_A engaged with these
  false-negative rate: 2/3 ≈ 67% for this anchor

Pinterest's measured: "The false negative rate jumps from ~0% with IID sampling to as high as ~30% with request-sorted data."

The fix — user-level masking¶

Extend identity masking (already used to exclude the anchor's own positive y_i from negatives) to exclude any candidate whose user equals the anchor's user:

L_i = -log[ exp(s(x_i, y_i) - log p_y_i)
           / sum_{k : x_k ≠ x_i} exp(s(x_i, y_k) - log p_y_k) ]

The x_k ≠ x_i constraint means: only candidates from different users count as valid negatives. Canonical patterns/user-level-negative-masking-infonce.

Why the wiki canonicalises it¶

The false-negative problem is an under-appreciated cost of dataset reorderings (sorting, locality-grouping) done for non-training reasons. Naming it makes the diagnostic conversation concrete:

Diagnosis: "What's the false-negative rate in your request-sorted training batches?"
Fix surface: identity masking in InfoNCE → extend to same-user / same-session / same-context masking.
Trade-off: each masked candidate shrinks the effective negative pool — may need larger batches to compensate.

Generalisations¶

Same-query in search: if batches group by query, other candidates for the same query are false negatives for the anchor.
Same-session: session-sorted batches suffer the same pathology.
Same-context (time window, geography): any grouping that aligns with the positive-signal structure.

Caveats¶

Rate is workload-dependent: ~30% is Pinterest-specific; depends on unique-users-per-batch, user activity distribution, item corpus size.
Pool shrink: aggressive masking reduces valid negatives per anchor; may degrade gradient quality if batches are small.
Not a BatchNorm problem: this is the retrieval-specific half of IID disruption — ranking models have a different failure mode (BatchNorm statistics, fixed by SyncBatchNorm).

Seen in¶

2026-04-13 Pinterest — Scaling Recommendation Systems with Request-Level Deduplication (sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication) — canonical wiki instance: ~0% → ~30% false-negative rate jump on request-sorted batches; user-level masking extension of the InfoNCE identity mask; parity-with-IID quality restored.
2026-04-27 Pinterest — From Clicks to Conversions (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation) — sibling negative-sampling context: Pinterest's shopping conversion CG supplements in-batch negatives with served-but-not-engaged ad impressions as hard negatives. Distinct from the in-batch false-negative hazard — these are explicit non-positives for the specific (user, ad) pair, not false-negative-prone random items.

concepts/iid-disruption-from-request-sorted-data — the parent failure mode this is one instantiation of.
concepts/request-level-deduplication — the optimisation programme.
patterns/user-level-negative-masking-infonce — the fix.
concepts/two-tower-architecture — the architecture pattern where this arises.
concepts/ad-impression-as-hard-negative — sibling negative-sampling concept for ads-retrieval contrastive training.