CONCEPT Cited by 1 source
In-batch negative false-negative¶
Definition¶
In contrastive two-tower retrieval training, the in-batch negative sampling trick uses other candidates in the same training batch as negative examples for each anchor user-positive pair — saves the cost of sampling random negatives from a huge catalog.
A false negative is an "in-batch negative" item that is actually a positive for the anchor user (the user did engage with it, it's just listed as a different row's positive). Training the model to push the anchor user's embedding away from that item actively degrades retrieval quality because the model learns to avoid items the user actually engaged with.
With IID-sampled batches, the false-negative rate is near zero: users engage with a tiny fraction of the total item corpus, so the probability that a random in-batch item is also a positive for the anchor is negligible.
With request-sorted batches (see concepts/iid-disruption-from-request-sorted-data), batches concentrate around fewer users and "each user may have dozens or hundreds of engagements grouped together." The false-negative rate jumps from ~0% to ~30% on Pinterest workloads, depending on the number of unique users per batch (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication).
The mechanism¶
Standard InfoNCE loss with logit correction (Yi et al., 2019) uses the similarity function s(x, y) (dot product between user embedding x and item embedding y):
where (x_i, y_i) is the anchor user-positive pair and {y_k} are candidates in batch B.
Under IID:
batch = [ (user_A, item_1 engaged),
(user_B, item_2 engaged),
(user_C, item_3 engaged),
(user_D, item_4 engaged) ]
For anchor (user_A, item_1):
positive = item_1
negatives = { item_2, item_3, item_4 } ← all items user_A did NOT engage with
false-negative rate: ~0%
Under request-sorted:
batch = [ (user_A, item_1),
(user_A, item_2), ← user_A DID engage with item_2
(user_A, item_3), ← user_A DID engage with item_3
(user_B, item_4) ]
For anchor (user_A, item_1):
positive = item_1
negatives = { item_2, item_3, item_4 }
└────┬────┘
FALSE NEGATIVES: user_A engaged with these
false-negative rate: 2/3 ≈ 67% for this anchor
Pinterest's measured: "The false negative rate jumps from ~0% with IID sampling to as high as ~30% with request-sorted data."
The fix — user-level masking¶
Extend identity masking (already used to exclude the anchor's own positive y_i from negatives) to exclude any candidate whose user equals the anchor's user:
The x_k ≠ x_i constraint means: only candidates from different users count as valid negatives. Canonical patterns/user-level-negative-masking-infonce.
Why the wiki canonicalises it¶
The false-negative problem is an under-appreciated cost of dataset reorderings (sorting, locality-grouping) done for non-training reasons. Naming it makes the diagnostic conversation concrete:
- Diagnosis: "What's the false-negative rate in your request-sorted training batches?"
- Fix surface: identity masking in InfoNCE → extend to same-user / same-session / same-context masking.
- Trade-off: each masked candidate shrinks the effective negative pool — may need larger batches to compensate.
Generalisations¶
- Same-query in search: if batches group by query, other candidates for the same query are false negatives for the anchor.
- Same-session: session-sorted batches suffer the same pathology.
- Same-context (time window, geography): any grouping that aligns with the positive-signal structure.
Caveats¶
- Rate is workload-dependent: ~30% is Pinterest-specific; depends on unique-users-per-batch, user activity distribution, item corpus size.
- Pool shrink: aggressive masking reduces valid negatives per anchor; may degrade gradient quality if batches are small.
- Not a BatchNorm problem: this is the retrieval-specific half of IID disruption — ranking models have a different failure mode (BatchNorm statistics, fixed by SyncBatchNorm).
Seen in¶
- 2026-04-13 Pinterest — Scaling Recommendation Systems with Request-Level Deduplication (sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication) — canonical wiki instance: ~0% → ~30% false-negative rate jump on request-sorted batches; user-level masking extension of the InfoNCE identity mask; parity-with-IID quality restored.
Related¶
- concepts/iid-disruption-from-request-sorted-data — the parent failure mode this is one instantiation of.
- concepts/request-level-deduplication — the optimisation programme.
- patterns/user-level-negative-masking-infonce — the fix.
- concepts/two-tower-architecture — the architecture pattern where this arises.