PATTERN Cited by 1 source
User-level negative masking for InfoNCE¶
Problem¶
In two-tower retrieval training, in-batch negatives are the standard cheap negative-sampling trick: each row's positive item serves as a negative for every other row's anchor. With IID-sampled batches (each row is a different user), the probability that an "in-batch negative" is actually a positive for the anchor user is negligible — a random user engages with a tiny fraction of the full item catalog.
Under request-sorted batches (adopted for columnar compression / bucket joins / backfill locality), batches concentrate around fewer users, each with dozens-to-hundreds of engagements adjacent in the batch. Many in-batch "negatives" are now items the anchor user actually engaged with — false negatives. Training the model to push apart items the user did engage with actively degrades retrieval quality.
Pinterest measured: "The false negative rate jumps from ~0% with IID sampling to as high as ~30% with request-sorted data, depending on the number of unique users per batch.²" (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication)
Pattern¶
Extend InfoNCE's identity mask to exclude all candidates that share the anchor's user. Standard InfoNCE with logit correction (Yi et al., 2019):
becomes:
The x_k ≠ x_i constraint — only candidates whose associated user differs from the anchor's user count as valid negatives. Pinterest's framing:
"To address this, we extended our existing identity masking to also exclude negatives that belong to the same user as the anchor."
Implementation-wise this is a mask tensor constructed per batch from the (anchor_user, candidate_user) outer equality — one additional N × N boolean matrix per batch (tiny), applied during softmax.
Effect¶
"This simple masking change allowed us to successfully adopt request-sorted data for retrieval model training while preserving model quality." (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication)
Pinterest reports quality recovery to IID-parity once masking is applied, though no specific metric delta is disclosed post-fix.
When to apply¶
Apply same-key masking when all of these hold:
- Training uses contrastive loss with in-batch negatives (InfoNCE, NCE, triplet-with-hard-negatives).
- Training data is grouped by a key that correlates with positive-engagement (user, session, query, context).
- The grouping produces mini-batches with multiple rows sharing the same key (not one row per key).
Generalisations — same-key masking¶
The pattern generalises to any correlated-row structure:
- Same-query masking in search retrieval training — other candidates for the same query are false negatives for the anchor.
- Same-session masking in conversation-aware retrieval.
- Same-context masking (time window, geography, device class) where the grouping correlates with engagement probability.
In each case: the identity mask (exclude the anchor's own positive) → extended to: exclude all rows sharing the anchor's correlation key.
Trade-off — smaller effective negative pool¶
Every masked row reduces the effective negative-pool size for that anchor. If batches are small or if users have many positives per batch, the remaining negative pool may be too small for the contrastive loss to produce strong gradients. Mitigations:
- Larger batch sizes to compensate for the shrinkage (Pinterest does not explicitly disclose batch-size tuning for this).
- Mix in sampled negatives from outside the batch (random / hard-negative-mined).
- Higher unique-user count per batch — trades some locality benefit against more negatives.
Pairs with SyncBatchNorm¶
User-level masking is the retrieval-side correctness fix for IID disruption. The ranking-side correctness fix is SyncBatchNorm. Most recsys shops need both — ranking models have BatchNorm, retrieval models have in-batch negatives, and both fail under request-sorted data for different reasons.
Caveats¶
- Rate is workload-dependent. The ~30% false-negative rate is Pinterest-specific; depends on unique-users-per-batch, user-activity distribution, and item-catalog size.
- Not a fix for all retrieval-training pathologies. Hard-negative mining, embedding-version skew, and tower-capacity mismatch are orthogonal and need separate treatment.
- Pinterest doesn't disclose whether batch size was tuned up to compensate for masking-induced negative-pool shrinkage.
- Mask construction cost — the
N × Nuser-equality matrix is cheap at typical batch sizes (hundreds to low-thousands) but grows quadratically. - Applies only at training time. Serving-side two-tower retrieval is already deduplicated by construction; the masking only exists in the training-loss computation.
Seen in¶
- 2026-04-13 Pinterest — Scaling Recommendation Systems with Request-Level Deduplication (sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication) — canonical wiki pattern instance: ~0% → ~30% false-negative-rate jump on request-sorted batches; user-level masking as the InfoNCE-logit-correction extension that recovers IID-baseline retrieval quality.
Related¶
- concepts/in-batch-negative-false-negative — the failure mode this pattern corrects.
- concepts/iid-disruption-from-request-sorted-data — the parent failure-mode class.
- concepts/two-tower-architecture — the architecture family this fixes.
- patterns/syncbatchnorm-for-correlated-batches — the ranking-side companion correctness fix.
- patterns/sort-by-request-id-for-columnar-compression — the storage optimisation that triggers the failure mode.
- concepts/request-level-deduplication — the overarching discipline.
- companies/pinterest