Pinterest — Enhancing Ad Relevance: Integrating Real-Time Context into Sequential Recommender Models¶
Summary¶
Pinterest Ads Vertical Modeling (Huiqin Xin, Lakshmi Manoharan, Karthik Jayasurya, Ziwei Guo, Alina Liviniuk) describe the Contextual Sequential Two-Tower Model — an evolution of Pinterest's prior offsite-conversion-history Transformer-based candidate generator that adds real-time on-Pinterest context (the subject Pin a user is currently viewing) into the user tower at request time. The motivation is structural: the prior sequential CG produced user embeddings purely offline from offsite history with no awareness of the current session, leaving its candidates uncompetitive on contextual surfaces like Related Pins and Search — the post discloses "less than 1% of impressions on Related Pins were attributed to this CG" before the change. The fix has three integrated pieces: (1) a context layer added to the user tower that consumes subject-Pin features (interest-category embeddings weighted by confidence) plus user demographics; (2) a synthetic context augmentation training scheme — pseudo-context derived from the positive label at training time so the model learns to use context, paired with high dropout on the context layer to prevent over-reliance and preserve the historical-sequence signal; and (3) a hybrid offline/online user tower inference — the cost-heavy Transformer encoder for offsite history runs offline (last hidden state cached daily in the feature store), while the context layer + final MLP head run online at ad-request time, fusing the cached offline state with the real-time subject-Pin features. Production wins: 3x–10x Recall@K improvement vs the production model in offline evaluation; ~275–300% lift in median candidate relevance; +1.08% ads relevance metric on Related Pins overall; 2x more candidates retrieved that survive to impression; ~0.7% ROAS lift, rising to ~1.4% in top countries that account for the majority of revenue.
Key takeaways¶
-
Offline-only user embeddings are uncompetitive on contextual surfaces. Pinterest's prior Transformer-based offsite-history CG inferred user embeddings "offline purely from historical offsite behavior, meaning that at the moment an ad was served, the model had no knowledge of what the user was currently browsing on Pinterest." On Related Pins (where the "subject Pin" — the Pin the user is currently viewing — is a strong intent signal) and Search (where the query is a strong intent signal) this missing signal manifested as a survival-rate collapse: "less than 1% of impressions on Related Pins were attributed to this CG." Candidates were retrieved but did not survive ranking + auction. The diagnostic is survival rate, not recall — the CG was retrieving candidates that the downstream funnel kept dropping. This frames candidate survival rate as the load-bearing CG metric on contextual surfaces.
-
Add a context layer to the user tower; concatenate its output with the Transformer's. "The model now concatenates the output of the original Transformer encoder (which represents historical sequence information) with the output of the new context layer. This combined representation is then fed into the final Multi-Layer Perceptron (MLP) to derive the final user embedding." The context layer is a tower-internal architectural primitive — sibling in shape to Pinterest's parallel DCNv2+MLP cross layers from the shopping conversion CG in that both compose a real-time / structural component alongside a sequence-or-history component within the user tower, but distinct in purpose: parallel DCN+MLP is about feature crossing, the context layer is about fusing real-time intent signal with cached historical state. The context layer's input on Related Pins: "the aggregated embedding representations of the top interest categories of the subject Pin, weighted by their confidence scores." User demographics (age, country, gender) are also added to the user representation layer for personalisation.
-
Synthetic context augmentation: inject pseudo-context derived from the positive label at training time. Real-time context exists only at serving time — there is no on-Pinterest "current Pin" attached to a logged offsite conversion event. Pinterest's solution: "during model training, we artificially inject pseudo-context information derived from the positive label (the conversion event) into the input sequence. For example, by projecting the interest category features from the positive item, we encourage the model to retrieve items that are semantically related to the context associated with that user session." This is a training-serving-parity hack: at training time the "context" is a leakage-style projection from the label; at serving time the "context" is a real subject Pin. The shared input shape lets the same model architecture work both phases. Pinterest considered using real on-Pinterest context data merged with offsite history but rejected it for two reasons: "(1) Merging onsite data with offsite data presents significant technical difficulties. (2) We cannot guarantee that a user has viewed ad impressions on Related Pins between two sequential offsite events." — i.e. the real-context training data simply doesn't exist densely enough.
-
High dropout on the context layer prevents over-reliance on the synthetic signal. "A high dropout rate is used in the context layer during training to ensure the model still relies on the user's historical event sequence (the Transformer output)." This is the structural mitigation that makes synthetic-context augmentation safe — without it, the model could shortcut to the (label-leaked) pseudo-context and ignore the history signal entirely, collapsing performance on users with thin context but rich history. High dropout on augmented feature layer is the named pattern: when you inject a synthetic feature derived from labels, throttle the layer that consumes it so the main signal still drives optimisation.
-
Hybrid offline/online user tower inference splits the user tower at the boundary between cached historical state and real-time context. "Given that the context features (e.g., subject Pin features) are only known at the ad request time (online), we adopted a hybrid model inference approach. (1) Offline Inference: The majority of the user tower (the Transformer encoder) is inferred offline, and the last hidden state of the transformer (the encoded representations of the event sequence) is stored in the feature store. This is refreshed on a daily basis for users with new offsite activity. (2) Online Inference: The remaining part of the user tower — the context layer and the final MLP head — is computed online at serving time, taking the real-time context features and the pre-computed offline user signal as inputs." The split is architectural, not deployment-mechanical — you can't run the Transformer online (too expensive) and you can't run the context layer offline (no context features), so the model is designed at the boundary. Hybrid offline/online user tower inference is the named pattern; it generalises the two-tower-with-ANN-index serving pattern (where the item tower is precomputed) by precomputing part of the user tower as well.
-
Daily refresh cadence for the cached offline user signal. "This is refreshed on a daily basis for users with new offsite activity." The Transformer's last-hidden-state output for a user is stable enough day-to-day that daily refresh is acceptable. This is a freshness ceiling for the historical signal — anything happening within the last ~24 hours of offsite history is invisible to the served embedding until the next batch. The contextual + demographic signal compensates for the staleness of the historical signal.
-
Survival rate, not recall, is the headline production win. Pinterest's offline evaluation reports 3x–10x Recall@K improvement, but the more revealing online metrics are downstream-funnel: "the median relevance of the candidates went up by ~275–300%. On the Related Pins surface overall, the ads relevance metric improved by 1.08%. Furthermore, we observed a significant increase in candidate delivery, with 2x more ads candidates retrieved being delivered to impression." The 2x candidate-survival improvement is the structural validation: "less than 1% of impressions" before the change, 2x candidate delivery after — the CG went from a bystander on Related Pins to a meaningful contributor. Topline: ~0.7% ROAS lift, ~1.4% in top revenue countries.
-
Future work names two extensions. (a) Context surface expansion to Search: "A key next step is to extend the context-enhanced candidate generator to other high-stakes contextual surfaces, notably Search. This is particularly crucial for Search because maintaining high relevance between the presented ad candidates and the user's search queries is paramount." The context layer's input changes (search query embedding instead of subject-Pin interest-category embeddings) but the architectural pattern transfers. (b) Cross-attention fusion over concatenation: "We propose using cross-attention-based fusion, where the context layer embedding acts as the query and the sequence of encoded transformer outputs serves as the key/value. This approach will allow the final user-tower embedding to dynamically capture the importance of each history event based on the real-time context." Replaces the static concatenation with a context-dependent re-weighting of the history sequence — the context literally tells the model which historical events matter most for this query.
Architecture¶
Two-tower with context-extended user tower¶
┌──── Transformer encoder ────┐ ← cached offline daily
User offsite history ──────► │ (offsite conversion seq) │
└─────────────┬───────────────┘
│ last hidden state
│ (stored in feature store)
│
▼ (online at ad-request time)
┌────── concatenate ─────┐
│ │
Subject-Pin interest ────► [ context layer ] │
categories (weighted) │ (high dropout in │
+ user demographics │ training) │
└────────────┬───────────┘
│
▼
[ final MLP head ]
│
user embedding
│
└──────────────┐
▼
dot product → score
▲
│
pin embedding ◄─┘
(via ANN index, item tower)
Training with synthetic augmented data¶
At training time only:
┌─────────────────┐
│ positive label │ ── project interest categories ──► pseudo-context features
│ (conversion item)│ │
└─────────────────┘ │
▼
[ context layer ]
(high-dropout regularised)
At serving time:
┌─────────────────┐
│ subject Pin │ ── interest categories × confidence ──► real context features
│ (current Pin) │ │
└─────────────────┘ ▼
[ context layer ]
(no dropout at inference)
The shared shape — interest-category embeddings weighted by confidence — is what makes this pseudo-context substitutable for real context at training time. The model sees the same input distribution; only the source differs.
Feature engineering¶
User-side:
- Historical offsite event sequence — cached Transformer encoder last-hidden-state, daily-refreshed.
- Real-time context (Related Pins): subject-Pin's interest-category embeddings, weighted by confidence scores.
- User demographics: age, country, gender.
Pin-side: not detailed in this post (item tower is inherited from prior CG architecture).
Production results¶
Production datums attributed to "production system" / "real traffic ad data on Related Pins" (Pinterest Internal Data, Related Pins surface):
Offline evaluation¶
- 3x–10x increase in Recall@K vs production model. Recall@K = "the proportion of positive items found in the top-K retrieved items" where positives are "the candidates that survived the ranking funnel and delivered to the users."
Survival rate & relevance¶
- ~275–300% lift in median candidate relevance.
- +1.08% ads relevance metric on Related Pins overall.
- 2x more retrieved candidates delivered to impression.
Topline business metrics¶
- ~0.7% measurable lift in ROAS (Return on Ad Spend).
- ~1.4% ROAS lift in top countries (the majority of total revenue).
Caveats¶
- No architecture diagrams. Pinterest published two named figures (the Contextual Sequential two-tower model architecture, and the model-training-with-synthetic-augmented-data pipeline) that are not in the ingested markdown — referred to as Figure 1 and Figure 2 in the prose only.
- Hyperparameters undisclosed. No Transformer topology (layers, heads, hidden dim, sequence length), no context-layer dimensions, no MLP head dimensions, no dropout rate value, no batch size, no embedding dimensions, no offline → online split granularity in terms of compute / latency.
- Daily-refresh staleness window not quantified. Users with new offsite activity within the last day are partially invisible to their served embedding; Pinterest doesn't disclose the impact of stale-history users on candidate quality, nor whether high-velocity users get more frequent refreshes.
- Pseudo-context generation function unspecified. "Projecting the interest category features from the positive item" — the projection function (sum, mean, attention-weighted, learned) is not described.
- Latency / compute envelope undisclosed. No p50/p99 of the online portion of the user tower, no comparison of the offline-cached-state size to the prior fully-offline embedding size, no per-request compute footprint of the context layer + final MLP.
- Future cross-attention fusion not yet shipped as of this post — referenced as proposed work.
- Topline ROAS attribution. Pinterest reports "~0.7% measurable lift in conversion-related business metrics ROAS" but doesn't break down whether the lift is from incremental conversions, lower CPC due to better targeting, or both.
- Survival rate not the same as quality. A 2x increase in delivered candidates doesn't directly imply 2x revenue — lifts are smaller (1.08% relevance, 0.7% ROAS), suggesting the absolute floor of CG contribution started low and the survival improvement is partially absorbed by competing CGs.
- Prior post not separately ingested. The post links to "Ads Candidate Generation using Behavioral Sequence Modeling" (the baseline sequential CG) as a previous post; that post is not in the wiki corpus.
Source¶
- Original: https://medium.com/pinterest-engineering/enhancing-ad-relevance-integrating-real-time-context-into-sequential-recommender-models-bc3a2f9b682e?source=rss----4c5a5f6279b6---4
- Raw markdown:
raw/pinterest/2026-05-08-enhancing-ad-relevance-integrating-real-time-context-into-se-489c17ac.md
Related¶
- companies/pinterest
- systems/pinterest-contextual-sequential-cg — the system this post documents.
- systems/pinterest-sequential-cg — the baseline (prior post: Ads Candidate Generation using Behavioral Sequence Modeling).
- systems/pinterest-related-pins — primary serving surface.
- systems/pinterest-search — future surface expansion target.
- systems/pinterest-shopping-conversion-cg — sibling Pinterest ads CG (different lineage; shopping ads, parallel DCN+MLP, multi-task).
- systems/pinterest-ads-engagement-model — Pinterest's downstream ranking model the CG candidates feed into.
- systems/transformer — encoder for offsite history sequence; cached offline for hybrid inference.
- concepts/two-tower-architecture — the retrieval-stage substrate.
- concepts/context-layer-in-two-tower — tower-internal context-fusion primitive.
- concepts/subject-pin — Pinterest concept: the Pin the user is currently viewing.
- concepts/hybrid-tower-inference-split — split between offline-cached and online-computed parts of the user tower.
- concepts/pseudo-context-augmentation — synthetic data approach for serving-time-only features.
- concepts/candidate-survival-rate — diagnostic metric capturing CG → impression funnel.
- concepts/real-time-context-feature
- concepts/offsite-conversion-sparsity
- patterns/hybrid-offline-online-user-tower-inference
- patterns/synthetic-pseudo-context-from-label
- patterns/high-dropout-on-augmented-feature-layer