PATTERN Cited by 1 source
Synthetic pseudo-context from label¶
Pattern¶
When a model needs to consume a request-time-only feature that is not present in logged training data, synthesise a training-time pseudo-version of the feature derived from the positive label (or other training-time-available artefacts), with the same input shape as the real serving-time feature. Pair with regularisation against over-reliance on the consuming layer to mitigate label leakage.
Pinterest's canonical formulation in the Contextual Sequential CG (sources/2026-05-08-pinterest-enhancing-ad-relevance-integrating-real-time-context-into-sequential-recommender-models):
"During model training, we artificially inject pseudo-context information derived from the positive label (the conversion event) into the input sequence. For example, by projecting the interest category features from the positive item, we encourage the model to retrieve items that are semantically related to the context associated with that user session."
Paired with high dropout on the context layer to prevent the model from shortcutting to the leaked signal.
Why use it¶
A model that wants to use real-time context (the page the user is on, the query they just typed, the song they're playing) at serving time has a structural training-data problem: logged events typically don't have an attached context signal — and even when they do, the join to the relevant request-time state is sparse.
Pinterest's framing of the rejected alternative:
"We opted to use synthetic augmented data over real context data due to two main challenges: (1) Merging onsite data with offsite data presents significant technical difficulties. (2) We cannot guarantee that a user has viewed ad impressions on Related Pins between two sequential offsite events."
The remaining options are:
- Don't use the real-time feature (lose the intent signal).
- Build the real-context training pipeline (expensive, sparse data).
- Synthesise pseudo-context from training-time-available artefacts.
This pattern is option 3.
Mechanism¶
Training time:
positive label (the conversion item)
│
▼
feature projection (e.g., interest categories) → pseudo-context features (shape S)
│
▼
context-consuming layer (high dropout)
│
└─── concatenated with historical encoder output ───► final head
Serving time:
real-time context (the subject Pin)
│
▼
feature projection (e.g., interest categories × confidence) → real context features (shape S)
│
▼
context-consuming layer (no dropout at inference)
│
└─── concatenated with cached encoder output ───► final head
The shared projection — interest-category embeddings — is what makes pseudo-context substitutable. The context layer doesn't "know" whether its input came from a label projection (training) or a subject-Pin observation (serving); it just sees a vector of the same shape.
When to use¶
- Request-time-only feature with no analogue in training data. The classic forcing function.
- A plausible projection from training-time-available data exists. The label, related items, or other artefacts in the training example must produce a feature of compatible shape.
- The model can be regularised against label leakage. patterns/high-dropout-on-augmented-feature-layer is Pinterest's choice; alternatives include input perturbation, label-feature-conditional masking, or explicit label-leakage tests during evaluation.
- Real-context training data is unavailable, sparse, or expensive to build. If you can build the real-context pipeline cost-effectively, do that — it has no leakage hazard.
When not to use¶
- The serving-time feature has no plausible projection from training-time data. Nothing to synthesise — you'd need a different approach (separate model, dual-train, or accept the missing feature).
- You can build the real-context training pipeline cheaply. Real context is always preferable to synthetic; pseudo-context is the workaround when real context is too hard to assemble.
- The model is too small to absorb the regularisation cost. High dropout on the context layer means the model is partially blind to context during training; if your model has limited capacity, the regularisation might cost more than the pseudo-context gives.
Companion patterns¶
This pattern is structurally inseparable from:
- patterns/high-dropout-on-augmented-feature-layer — without this, the model shortcuts to the leaked label projection and degrades at serving time when only real (non-leaked) context is available. The two patterns are always shipped together.
- patterns/hybrid-offline-online-user-tower-inference — pseudo-context augmentation makes sense in models that have a real-time-context-consuming online layer; that almost always implies a hybrid offline/online architecture for the rest of the user tower.
Hazards¶
Label leakage is the structural risk¶
The pseudo-context is, by construction, derived from the label. The model can learn to use pseudo-context as a label predictor. High-dropout regularisation reduces but does not eliminate this. The empirical question is whether serving-time performance with real context (not pseudo-context) is strong enough — Pinterest's online wins (3x–10x recall@K, 1.08% relevance, 0.7% ROAS) suggest yes.
Distribution-shift between pseudo and real¶
The pseudo-context (label-projection) and real context (request-time observation) might have different distributions. Pinterest's choice — interest-category features from items that are themselves "items the user converted on" — overlaps reasonably with subject Pins (items the user is currently engaging with). The pattern is more reliable when the projection is genuinely close to the real signal's distribution.
Pseudo-context generation function is a hyperparameter¶
Pinterest projects interest-category features but doesn't name the projection function (sum, mean, attention-weighted, learned). This is a tuning surface; the choice can affect whether the model learns useful context-aware patterns or just memorises the leakage.
Evaluation must use real, not pseudo, context¶
Offline evaluation should never feed the pseudo-context to a "trained" model — that's just measuring the model's ability to read the leaked label. Use held-out real-context data (or surrogate) for evaluation; Pinterest reports recall@K from "logged features from real traffic ad data on Related Pins" — i.e. real context, not pseudo.
Caveats¶
- Single named instance on the wiki. Pinterest is the only documented case. Similar techniques likely exist in other large-scale recsys / ads-ML systems but the named pattern is not standard nomenclature.
- Projection function unspecified. Pinterest's "projecting the interest category features" doesn't name the function.
- Dropout rate unspecified. "High dropout" is qualitative; Pinterest doesn't quantify.
- Generalises beyond conversion events. The label can be any positive signal that has feature-extractable analogues to the real-time context.
Seen in¶
- 2026-05-08 Pinterest — Enhancing Ad Relevance (sources/2026-05-08-pinterest-enhancing-ad-relevance-integrating-real-time-context-into-sequential-recommender-models) — canonical wiki instance. Pseudo-context derived from the positive conversion item's interest-category features, injected at training time, paired with high-dropout regularisation on the context layer.