SYSTEM Cited by 1 source

Pinterest Shopping Conversion Candidate Generation¶

Definition¶

Pinterest Shopping Conversion Candidate Generation is Pinterest's dedicated retrieval-stage two-tower model for shopping ads, optimised for offsite conversions (checkout, add-to-cart) rather than the onsite-engagement optimisation of Pinterest's inherited shopping-ads retrieval pipeline. It is deployed across the three Pinterest shopping surfaces (Home Feed, Related Pins, Search) and serves 600+ million MAUs.

Launched first in 2023 as a multi-head architecture, the system was refactored in 2025 into a unified single-head multi-task architecture with a parallel DCNv2 + MLP cross-layer design and an advertiser-level loss function (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation).

Problem framing¶

Ads retrieval had historically run on engagement-based (click / repin) models — good for onsite interaction, not optimised for offsite lower-funnel conversions. Because conversion events happen offsite and are advertiser-reported, they are "significantly sparser and noisier than onsite engagement signals". Treating conversion as a subcase of engagement under-weights its signal; a dedicated retrieval pipeline is the architectural response.

This is the canonical shopping-conversion CG decomposition: separate retrieval model whose loss and positives are conversion-centric, running in parallel with the engagement CG pipeline, contributing a distinct candidate pool to the downstream ranking funnel.

Architecture¶

Two-tower shape with parallel cross layers¶

User features ──► [ User tower: parallel DCNv2 + 3-layer MLP ]
                          │
                          ▼
                   user embedding
                          │
                          └────────────┐
                                       ▼
                              dot product → score
                                       ▲
                                       │
                    pin embedding ◄────┘
                          ▲
                          │
Pin features  ──► [ Pin tower: parallel DCNv2 + 3-layer MLP ]

Retrieval stage only — no explicit user-Pin interaction features ("as there are no explicit user-Pin interaction features at this retrieval stage").
Parallel cross + deep cross layer architecture applied to both towers: the original input is consumed by DCNv2 (explicit bounded-degree cross features) and a 3-layer MLP (implicit abstract patterns) in parallel; their outputs are combined and fed to the head MLP.

Feature engineering¶

User-side:

Context features — real-time intent signals for Related Pins + Search. Examples: subject Pin's visual embedding, GraphSage Pin-graph embedding.
Preference + historical features — long-term personalisation. Examples: demographics, aggregated historical actions, sequential user-action data encoded by a Transformer into a user-history embedding.

Pin-side:

ID features.
Multi-modal / content features for semantic understanding.
Performance features tracking engagement.

Training data design (conversion sparsity mitigation)¶

Three design decisions address conversion sparsity:

Multi-surface model — "We train a single model across all shopping surfaces (Homefeed, Related Pins, Search) to avoid fragmenting sparse conversion labels." Surface-specific features encode contextual differences within the shared model.
Dual positive signals — supplement sparse conversion positives with onsite engagement data (clicks, repins) to broaden coverage. Click positives are log-reweighted by click duration: w = f(log(1 + t / t_max)) where t is click dwell time in seconds and t_max caps the reweight. Bounce clicks are down-weighted; dwell-time-confirmed engagement is emphasised.
Ad impressions as hard negatives — in-batch negatives (cheap, abundant) are supplemented with "served-but-not-engaged ad impressions" as hard negatives. The hard-negatives pool "reflects the real distribution of served ads, exposing the model to a more representative inventory and promoting robust contrastive learning."

Loss design evolution¶

2023 — multi-head architecture:

           shared encoders
               │
       ┌───────┴───────┐
       ▼               ▼
  engagement      conversion
     head             head
       │               │
  sampled softmax  sampled softmax
       │               │
       └── weighted loss combination ──┘
                   │
          (task weights tuned to prevent
          engagement signal from diluting
          conversion signal)

At serving: only conversion head's Pin + query embeddings used.

Engagement head stabilises shared parameters (abundant data); conversion head preserves purchase-intent signal (sparse data).
Auxiliary-task regularisation with the abundant signal (engagement) regularising the sparse signal's (conversion) shared representation.

2025 — unified single-head multi-task architecture:

          shared encoders + parallel DCNv2+MLP
                     │
               single unified head
                     │
    multi-task optimisation (conversion + engagement)
    + advertiser-level loss as additional objective
                     │
             served embeddings
     (directly benefit from multi-task optimisation)

Unification rationale: per-head conversion embeddings were unstable in "regions of low conversion coverage"; merging the heads lets the single embedding set inherit multi-task signal directly.
Advertiser-level loss added as parallel training objective: "conversion data at the Pin level exhibit high variance, making it challenging to reliably model purchase intent from Pin-level supervision alone. To address this, we introduce an advertiser-level loss function as an additional training objective."

Production results¶

(All figures: Pinterest Internal Data, US, 2023–2025 — citation "⁴" in source post.)

2023 launch (baseline shopping conversion CG):
+2.3% shopping conversion volume
+2.7% shopping impression-to-conversion rate
+1.5% CTR (byproduct — higher-intent ranking improves click-through too)
+2.2% CTR over 30 seconds (byproduct — better dwell-aligned targeting)
2025 refresh (unified MTL + parallel DCN + advertiser loss):
+42% recall@100 for conversion tasks vs 2023 model
+3.1% RoAS for US shopping campaigns
Parallel DCNv2 + MLP vs sequential (standalone architecture validation):
+11% offline recall@1000 on conversion task.
Generalisation win: adopted by all Pinterest production engagement retrieval models after the shopping-CG validation — becomes a retrieval-stage architectural primitive, not a shopping-specific trick.

Relationship to Pinterest's other ads-ranking infrastructure¶

Pinterest Ads Engagement Model — the ranking-stage unified multi-surface model. Shopping Conversion CG is the retrieval-stage conversion-optimised sibling. Both:
Run across HF + SR + RP surfaces.
Use shared trunk + surface-specific features (though the engagement model uses surface-specific tower trees + surface-specific calibration + surface-specific checkpoint exports, whereas the conversion CG uses a single multi-surface model with surface-specific features in the input).
Use parallel DCNv2 cross layers (the engagement model uses DCNv2 as a projection layer; the conversion CG uses parallel DCNv2 + MLP inside both towers).
Leverage Pinterest's long-sequence Transformer user-history encoder.
Pinterest L1 Ranking — Pinterest's L1 CVR two-tower ad-ranking model. Shopping Conversion CG is one stage earlier in the funnel: retrieval feeds L1, L1 narrows to the handful L2 ranker sees, L2 feeds the auction. Both are two-tower with ANN-index Pin-side embeddings; the conversion-CG is conversion-optimised whereas L1 CVR is a downstream CVR prediction model.

Caveats¶

No architecture diagrams — Pinterest published three figures (click-duration reweighting formula, sequential vs parallel cross architecture, multi-head vs unified multi-task) that are not in the ingested markdown.
Hyperparameters undisclosed. No DCNv2 cross-layer count, no MLP hidden dims, no embedding dimension, no t_max, no task-loss weighting, no advertiser-loss weighting, no batch size, no ANN-index choice.
No latency / infra-cost datums. Production wins are quality metrics only — no p50/p99, no per-request compute, no cost envelope.
Interaction with engagement-CG pipeline undocumented. Pinterest runs both the conversion CG and the engagement-based shopping retrieval in parallel; the post doesn't describe how their candidate pools merge/dedupe, nor how L1 + L2 consume the two sources.
Scale details undisclosed. Impression volume, conversion volume, training data window, online-learning cadence — all undocumented.
Multi-head → unified transition risks unnamed. Pinterest doesn't describe what they had to solve during the transition (did conversion quality dip during the refactor? were there calibration surprises? rollout staging?).
2023-era post (Mudgal et al. 2024) referenced but not separately ingested.

Seen in¶

2026-04-27 Pinterest — From Clicks to Conversions: Architecting Shopping Conversion Candidate Generation at Pinterest (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation) — canonical wiki instance. Full lifecycle from 2023 multi-head launch through 2025 unified-MTL refresh; names the parallel DCNv2+MLP cross architecture, advertiser-level loss, dual positive signal with click-duration reweighting, ad-impression hard negatives, multi-surface single-model design.