SYSTEM Cited by 1 source
Pinterest Shopping Conversion Candidate Generation¶
Definition¶
Pinterest Shopping Conversion Candidate Generation is Pinterest's dedicated retrieval-stage two-tower model for shopping ads, optimised for offsite conversions (checkout, add-to-cart) rather than the onsite-engagement optimisation of Pinterest's inherited shopping-ads retrieval pipeline. It is deployed across the three Pinterest shopping surfaces (Home Feed, Related Pins, Search) and serves 600+ million MAUs.
Launched first in 2023 as a multi-head architecture, the system was refactored in 2025 into a unified single-head multi-task architecture with a parallel DCNv2 + MLP cross-layer design and an advertiser-level loss function (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation).
Problem framing¶
Ads retrieval had historically run on engagement-based (click / repin) models — good for onsite interaction, not optimised for offsite lower-funnel conversions. Because conversion events happen offsite and are advertiser-reported, they are "significantly sparser and noisier than onsite engagement signals". Treating conversion as a subcase of engagement under-weights its signal; a dedicated retrieval pipeline is the architectural response.
This is the canonical shopping-conversion CG decomposition: separate retrieval model whose loss and positives are conversion-centric, running in parallel with the engagement CG pipeline, contributing a distinct candidate pool to the downstream ranking funnel.
Architecture¶
Two-tower shape with parallel cross layers¶
User features ──► [ User tower: parallel DCNv2 + 3-layer MLP ]
│
▼
user embedding
│
└────────────┐
▼
dot product → score
▲
│
pin embedding ◄────┘
▲
│
Pin features ──► [ Pin tower: parallel DCNv2 + 3-layer MLP ]
- Retrieval stage only — no explicit user-Pin interaction features ("as there are no explicit user-Pin interaction features at this retrieval stage").
- Parallel cross + deep cross layer architecture applied to both towers: the original input is consumed by DCNv2 (explicit bounded-degree cross features) and a 3-layer MLP (implicit abstract patterns) in parallel; their outputs are combined and fed to the head MLP.
Feature engineering¶
User-side:
- Context features — real-time intent signals for Related Pins + Search. Examples: subject Pin's visual embedding, GraphSage Pin-graph embedding.
- Preference + historical features — long-term personalisation. Examples: demographics, aggregated historical actions, sequential user-action data encoded by a Transformer into a user-history embedding.
Pin-side:
- ID features.
- Multi-modal / content features for semantic understanding.
- Performance features tracking engagement.
Training data design (conversion sparsity mitigation)¶
Three design decisions address conversion sparsity:
- Multi-surface model — "We train a single model across all shopping surfaces (Homefeed, Related Pins, Search) to avoid fragmenting sparse conversion labels." Surface-specific features encode contextual differences within the shared model.
- Dual positive signals — supplement sparse conversion positives with onsite engagement data (clicks, repins) to broaden coverage. Click positives are log-reweighted by click duration:
w = f(log(1 + t / t_max))wheretis click dwell time in seconds andt_maxcaps the reweight. Bounce clicks are down-weighted; dwell-time-confirmed engagement is emphasised. - Ad impressions as hard negatives — in-batch negatives (cheap, abundant) are supplemented with "served-but-not-engaged ad impressions" as hard negatives. The hard-negatives pool "reflects the real distribution of served ads, exposing the model to a more representative inventory and promoting robust contrastive learning."
Loss design evolution¶
2023 — multi-head architecture:
shared encoders
│
┌───────┴───────┐
▼ ▼
engagement conversion
head head
│ │
sampled softmax sampled softmax
│ │
└── weighted loss combination ──┘
│
(task weights tuned to prevent
engagement signal from diluting
conversion signal)
At serving: only conversion head's Pin + query embeddings used.
- Engagement head stabilises shared parameters (abundant data); conversion head preserves purchase-intent signal (sparse data).
- Auxiliary-task regularisation with the abundant signal (engagement) regularising the sparse signal's (conversion) shared representation.
2025 — unified single-head multi-task architecture:
shared encoders + parallel DCNv2+MLP
│
single unified head
│
multi-task optimisation (conversion + engagement)
+ advertiser-level loss as additional objective
│
served embeddings
(directly benefit from multi-task optimisation)
- Unification rationale: per-head conversion embeddings were unstable in "regions of low conversion coverage"; merging the heads lets the single embedding set inherit multi-task signal directly.
- Advertiser-level loss added as parallel training objective: "conversion data at the Pin level exhibit high variance, making it challenging to reliably model purchase intent from Pin-level supervision alone. To address this, we introduce an advertiser-level loss function as an additional training objective."
Production results¶
(All figures: Pinterest Internal Data, US, 2023–2025 — citation "⁴" in source post.)
- 2023 launch (baseline shopping conversion CG):
- +2.3% shopping conversion volume
- +2.7% shopping impression-to-conversion rate
- +1.5% CTR (byproduct — higher-intent ranking improves click-through too)
- +2.2% CTR over 30 seconds (byproduct — better dwell-aligned targeting)
- 2025 refresh (unified MTL + parallel DCN + advertiser loss):
- +42% recall@100 for conversion tasks vs 2023 model
- +3.1% RoAS for US shopping campaigns
- Parallel DCNv2 + MLP vs sequential (standalone architecture validation):
- +11% offline recall@1000 on conversion task.
- Generalisation win: adopted by all Pinterest production engagement retrieval models after the shopping-CG validation — becomes a retrieval-stage architectural primitive, not a shopping-specific trick.
Relationship to Pinterest's other ads-ranking infrastructure¶
- Pinterest Ads Engagement Model — the ranking-stage unified multi-surface model. Shopping Conversion CG is the retrieval-stage conversion-optimised sibling. Both:
- Run across HF + SR + RP surfaces.
- Use shared trunk + surface-specific features (though the engagement model uses surface-specific tower trees + surface-specific calibration + surface-specific checkpoint exports, whereas the conversion CG uses a single multi-surface model with surface-specific features in the input).
- Use parallel DCNv2 cross layers (the engagement model uses DCNv2 as a projection layer; the conversion CG uses parallel DCNv2 + MLP inside both towers).
- Leverage Pinterest's long-sequence Transformer user-history encoder.
- Pinterest L1 Ranking — Pinterest's L1 CVR two-tower ad-ranking model. Shopping Conversion CG is one stage earlier in the funnel: retrieval feeds L1, L1 narrows to the handful L2 ranker sees, L2 feeds the auction. Both are two-tower with ANN-index Pin-side embeddings; the conversion-CG is conversion-optimised whereas L1 CVR is a downstream CVR prediction model.
Caveats¶
- No architecture diagrams — Pinterest published three figures (click-duration reweighting formula, sequential vs parallel cross architecture, multi-head vs unified multi-task) that are not in the ingested markdown.
- Hyperparameters undisclosed. No DCNv2 cross-layer count, no MLP hidden dims, no embedding dimension, no
t_max, no task-loss weighting, no advertiser-loss weighting, no batch size, no ANN-index choice. - No latency / infra-cost datums. Production wins are quality metrics only — no p50/p99, no per-request compute, no cost envelope.
- Interaction with engagement-CG pipeline undocumented. Pinterest runs both the conversion CG and the engagement-based shopping retrieval in parallel; the post doesn't describe how their candidate pools merge/dedupe, nor how L1 + L2 consume the two sources.
- Scale details undisclosed. Impression volume, conversion volume, training data window, online-learning cadence — all undocumented.
- Multi-head → unified transition risks unnamed. Pinterest doesn't describe what they had to solve during the transition (did conversion quality dip during the refactor? were there calibration surprises? rollout staging?).
- 2023-era post (Mudgal et al. 2024) referenced but not separately ingested.
Seen in¶
- 2026-04-27 Pinterest — From Clicks to Conversions: Architecting Shopping Conversion Candidate Generation at Pinterest (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation) — canonical wiki instance. Full lifecycle from 2023 multi-head launch through 2025 unified-MTL refresh; names the parallel DCNv2+MLP cross architecture, advertiser-level loss, dual positive signal with click-duration reweighting, ad-impression hard negatives, multi-surface single-model design.
Related¶
- companies/pinterest
- systems/pinterest-ads-engagement-model · systems/pinterest-l1-ranking
- systems/pinterest-home-feed · systems/pinterest-search · systems/pinterest-related-pins
- systems/dcnv2 · systems/graphsage · systems/transformer
- concepts/two-tower-architecture · concepts/multi-task-learning · concepts/auxiliary-task-regularization · concepts/offsite-conversion-sparsity · concepts/click-duration-reweighting · concepts/advertiser-level-loss · concepts/parallel-cross-and-deep-network · concepts/ad-impression-as-hard-negative · concepts/shopping-conversion-candidate-generation · concepts/retrieval-ranking-funnel
- patterns/parallel-dcn-mlp-cross-layers · patterns/dual-positive-signal-for-sparse-labels · patterns/unified-multi-task-over-multi-head · patterns/auxiliary-engagement-task-for-conversion-retrieval