Skip to content

SYSTEM Cited by 1 source

Pinterest Shopping Conversion Candidate Generation

Definition

Pinterest Shopping Conversion Candidate Generation is Pinterest's dedicated retrieval-stage two-tower model for shopping ads, optimised for offsite conversions (checkout, add-to-cart) rather than the onsite-engagement optimisation of Pinterest's inherited shopping-ads retrieval pipeline. It is deployed across the three Pinterest shopping surfaces (Home Feed, Related Pins, Search) and serves 600+ million MAUs.

Launched first in 2023 as a multi-head architecture, the system was refactored in 2025 into a unified single-head multi-task architecture with a parallel DCNv2 + MLP cross-layer design and an advertiser-level loss function (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation).

Problem framing

Ads retrieval had historically run on engagement-based (click / repin) models — good for onsite interaction, not optimised for offsite lower-funnel conversions. Because conversion events happen offsite and are advertiser-reported, they are "significantly sparser and noisier than onsite engagement signals". Treating conversion as a subcase of engagement under-weights its signal; a dedicated retrieval pipeline is the architectural response.

This is the canonical shopping-conversion CG decomposition: separate retrieval model whose loss and positives are conversion-centric, running in parallel with the engagement CG pipeline, contributing a distinct candidate pool to the downstream ranking funnel.

Architecture

Two-tower shape with parallel cross layers

User features ──► [ User tower: parallel DCNv2 + 3-layer MLP ]
                   user embedding
                          └────────────┐
                              dot product → score
                    pin embedding ◄────┘
Pin features  ──► [ Pin tower: parallel DCNv2 + 3-layer MLP ]
  • Retrieval stage only — no explicit user-Pin interaction features ("as there are no explicit user-Pin interaction features at this retrieval stage").
  • Parallel cross + deep cross layer architecture applied to both towers: the original input is consumed by DCNv2 (explicit bounded-degree cross features) and a 3-layer MLP (implicit abstract patterns) in parallel; their outputs are combined and fed to the head MLP.

Feature engineering

User-side:

  • Context features — real-time intent signals for Related Pins + Search. Examples: subject Pin's visual embedding, GraphSage Pin-graph embedding.
  • Preference + historical features — long-term personalisation. Examples: demographics, aggregated historical actions, sequential user-action data encoded by a Transformer into a user-history embedding.

Pin-side:

  • ID features.
  • Multi-modal / content features for semantic understanding.
  • Performance features tracking engagement.

Training data design (conversion sparsity mitigation)

Three design decisions address conversion sparsity:

  1. Multi-surface model"We train a single model across all shopping surfaces (Homefeed, Related Pins, Search) to avoid fragmenting sparse conversion labels." Surface-specific features encode contextual differences within the shared model.
  2. Dual positive signals — supplement sparse conversion positives with onsite engagement data (clicks, repins) to broaden coverage. Click positives are log-reweighted by click duration: w = f(log(1 + t / t_max)) where t is click dwell time in seconds and t_max caps the reweight. Bounce clicks are down-weighted; dwell-time-confirmed engagement is emphasised.
  3. Ad impressions as hard negatives — in-batch negatives (cheap, abundant) are supplemented with "served-but-not-engaged ad impressions" as hard negatives. The hard-negatives pool "reflects the real distribution of served ads, exposing the model to a more representative inventory and promoting robust contrastive learning."

Loss design evolution

2023 — multi-head architecture:

           shared encoders
       ┌───────┴───────┐
       ▼               ▼
  engagement      conversion
     head             head
       │               │
  sampled softmax  sampled softmax
       │               │
       └── weighted loss combination ──┘
          (task weights tuned to prevent
          engagement signal from diluting
          conversion signal)

At serving: only conversion head's Pin + query embeddings used.
  • Engagement head stabilises shared parameters (abundant data); conversion head preserves purchase-intent signal (sparse data).
  • Auxiliary-task regularisation with the abundant signal (engagement) regularising the sparse signal's (conversion) shared representation.

2025 — unified single-head multi-task architecture:

          shared encoders + parallel DCNv2+MLP
               single unified head
    multi-task optimisation (conversion + engagement)
    + advertiser-level loss as additional objective
             served embeddings
     (directly benefit from multi-task optimisation)
  • Unification rationale: per-head conversion embeddings were unstable in "regions of low conversion coverage"; merging the heads lets the single embedding set inherit multi-task signal directly.
  • Advertiser-level loss added as parallel training objective: "conversion data at the Pin level exhibit high variance, making it challenging to reliably model purchase intent from Pin-level supervision alone. To address this, we introduce an advertiser-level loss function as an additional training objective."

Production results

(All figures: Pinterest Internal Data, US, 2023–2025 — citation "⁴" in source post.)

  • 2023 launch (baseline shopping conversion CG):
  • +2.3% shopping conversion volume
  • +2.7% shopping impression-to-conversion rate
  • +1.5% CTR (byproduct — higher-intent ranking improves click-through too)
  • +2.2% CTR over 30 seconds (byproduct — better dwell-aligned targeting)
  • 2025 refresh (unified MTL + parallel DCN + advertiser loss):
  • +42% recall@100 for conversion tasks vs 2023 model
  • +3.1% RoAS for US shopping campaigns
  • Parallel DCNv2 + MLP vs sequential (standalone architecture validation):
  • +11% offline recall@1000 on conversion task.
  • Generalisation win: adopted by all Pinterest production engagement retrieval models after the shopping-CG validation — becomes a retrieval-stage architectural primitive, not a shopping-specific trick.

Relationship to Pinterest's other ads-ranking infrastructure

  • Pinterest Ads Engagement Model — the ranking-stage unified multi-surface model. Shopping Conversion CG is the retrieval-stage conversion-optimised sibling. Both:
  • Run across HF + SR + RP surfaces.
  • Use shared trunk + surface-specific features (though the engagement model uses surface-specific tower trees + surface-specific calibration + surface-specific checkpoint exports, whereas the conversion CG uses a single multi-surface model with surface-specific features in the input).
  • Use parallel DCNv2 cross layers (the engagement model uses DCNv2 as a projection layer; the conversion CG uses parallel DCNv2 + MLP inside both towers).
  • Leverage Pinterest's long-sequence Transformer user-history encoder.
  • Pinterest L1 Ranking — Pinterest's L1 CVR two-tower ad-ranking model. Shopping Conversion CG is one stage earlier in the funnel: retrieval feeds L1, L1 narrows to the handful L2 ranker sees, L2 feeds the auction. Both are two-tower with ANN-index Pin-side embeddings; the conversion-CG is conversion-optimised whereas L1 CVR is a downstream CVR prediction model.

Caveats

  • No architecture diagrams — Pinterest published three figures (click-duration reweighting formula, sequential vs parallel cross architecture, multi-head vs unified multi-task) that are not in the ingested markdown.
  • Hyperparameters undisclosed. No DCNv2 cross-layer count, no MLP hidden dims, no embedding dimension, no t_max, no task-loss weighting, no advertiser-loss weighting, no batch size, no ANN-index choice.
  • No latency / infra-cost datums. Production wins are quality metrics only — no p50/p99, no per-request compute, no cost envelope.
  • Interaction with engagement-CG pipeline undocumented. Pinterest runs both the conversion CG and the engagement-based shopping retrieval in parallel; the post doesn't describe how their candidate pools merge/dedupe, nor how L1 + L2 consume the two sources.
  • Scale details undisclosed. Impression volume, conversion volume, training data window, online-learning cadence — all undocumented.
  • Multi-head → unified transition risks unnamed. Pinterest doesn't describe what they had to solve during the transition (did conversion quality dip during the refactor? were there calibration surprises? rollout staging?).
  • 2023-era post (Mudgal et al. 2024) referenced but not separately ingested.

Seen in

Last updated · 445 distilled / 1,275 read