Skip to content

SYSTEM Cited by 1 source

Pinterest Ads Engagement Model

Definition

Pinterest Ads Engagement Model is Pinterest's unified ads CTR-prediction model, consolidating three previously-independent per-surface production models (Home Feed, Search, Related Pins) into a single unified architecture with surface-specific tower trees, surface-specific calibration, multi-task heads, and surface-specific checkpoint exports. The model predicts "how likely a user is to engage with an ad" across multiple Pinterest surfaces.

Introduced in the 2026-03-03 Pinterest Engineering post Unifying Ads Engagement Modeling Across Pinterest Surfaces (sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces).

Architecture

         (user + candidate ad + context features)
          [ shared trunk — MMoE + long user sequence Transformer ]
          [ DCNv2 projection layer — latency-shrink output ]
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
     [ HF tower tree ] [ SR tower tree ] [ RP tower tree (future) ]
          │               │
     HF calibration  SR calibration  (view-type-specific)
          │               │
       HF CTR          SR CTR

Key architectural elements:

  • Shared trunk. MMoE (Multi-gate Mixture of Experts) + long user sequence Transformer. Neither component produced consistent gains as a standalone per-surface addition; composition into a unified model trained on combined HF+SR features delivered gains that cleared the cost bar.
  • DCNv2 projection layer. A deep cross network inserted between the Transformer output and downstream crossing + tower tree layers — projects Transformer outputs into a smaller representation, reducing serving latency while preserving signal.
  • Surface-specific tower trees. Each surface has its own tower tree + late-fusion + surface-specific modules. At serving time, each surface-specific tower tree handles only its surface's traffic — "avoiding unnecessary compute cost from modules that don't benefit other surfaces." HF + SR tower trees present at time of post; RP tower tree is future work.
  • View-type-specific calibration. HF and SR CTR are calibrated separately by a view-type-specific calibration layer — shared global calibration was suboptimal because it "implicitly mixes traffic distributions across surfaces."
  • Multi-task heads + surface-specific checkpoint exports. Each surface exports its own checkpoint from the unified training run so it can adopt the most appropriate architecture "while still benefiting from shared representation learning."

Serving efficiency work

The naive unified model increased latency ("merging feature maps and modules made the model larger"). Three efficiency levers restored the cost envelope:

  1. DCNv2 projection layer as above.
  2. Fused kernel embedding — fused kernels for embedding-lookup inference (latency) + TF32 for training throughput.
  3. Request-level user-embedding broadcasting. Instead of repeating heavy user embedding lookups for every candidate/request in a batch, fetch embeddings once per unique user and broadcast them back to the original request layout. Model inputs/outputs unchanged. Trade-off: upper bound on unique users per batch — if exceeded the request can fail, so Pinterest uses a tested unique-user number to keep the system reliable.

Unification sequencing

Pinterest followed a staged unification by CUDA throughput:

  1. HF + SR first — similar CUDA throughput characteristics.
  2. RP later — substantially higher cost profile; unified only after throughput + efficiency work stabilised.

Three guiding principles: start simple (merge strongest existing components), iterate incrementally (surface-aware modeling only after baseline proves value), maintain operational safety (safe rollout + monitoring + fast rollback at every step).

Results

  • "Significant improvements on both online and offline metrics" across HF and SR (specific percentages not disclosed in the post; reference to Pinterest internal data, US, 2025).
  • Offline improvements validated by online A/B experiments.
  • Resolves the pre-unification pain triple: low iteration velocity, redundant training cost, high maintenance burden.

Relationship to Pinterest's prior ads-ranking work

Caveats

  • No numerical deltas disclosed. The post is an architectural retrospective with qualitative wins — no A/B percentages, no latency percentiles, no per-request compute breakdown, no infra-cost numbers.
  • MMoE topology not described. Expert count, gate structure, distillation usage all undisclosed.
  • Long-sequence Transformer topology not described. Sequence length, attention heads, layer count, feature tokenisation not disclosed.
  • Embedded architecture diagram in the original post is not in the ingested markdown — the shape above is reconstructed from the text.
  • RP integration is future work at time of writing — the post is a live-journey retrospective, not a closed-project case study.
  • Request-level-broadcast failure mode — batches exceeding the tested-unique-user number can fail; the exact threshold is not disclosed.

Seen in

  • 2026-03-03 Pinterest — Unifying Ads Engagement Modeling Across Pinterest Surfaces (sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces) — canonical ingest; names the full architecture including MMoE + long-sequence Transformer trunk, DCNv2 projection layer, surface-specific tower trees, view-type-specific calibration, multi-task heads, surface-specific checkpoint exports, fused-kernel embedding, TF32, request-level user-embedding broadcasting.
Last updated · 319 distilled / 1,201 read