SYSTEM Cited by 1 source

Pinterest Ads Engagement Model¶

Definition¶

Pinterest Ads Engagement Model is Pinterest's unified ads CTR-prediction model, consolidating three previously-independent per-surface production models (Home Feed, Search, Related Pins) into a single unified architecture with surface-specific tower trees, surface-specific calibration, multi-task heads, and surface-specific checkpoint exports. The model predicts "how likely a user is to engage with an ad" across multiple Pinterest surfaces.

Introduced in the 2026-03-03 Pinterest Engineering post Unifying Ads Engagement Modeling Across Pinterest Surfaces (sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces).

Architecture¶

         (user + candidate ad + context features)
                          │
                          ▼
          [ shared trunk — MMoE + long user sequence Transformer ]
                          │
          [ DCNv2 projection layer — latency-shrink output ]
                          │
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
     [ HF tower tree ] [ SR tower tree ] [ RP tower tree (future) ]
          │               │
     HF calibration  SR calibration  (view-type-specific)
          │               │
       HF CTR          SR CTR

Key architectural elements:

Shared trunk. MMoE (Multi-gate Mixture of Experts) + long user sequence Transformer. Neither component produced consistent gains as a standalone per-surface addition; composition into a unified model trained on combined HF+SR features delivered gains that cleared the cost bar.
DCNv2 projection layer. A deep cross network inserted between the Transformer output and downstream crossing + tower tree layers — projects Transformer outputs into a smaller representation, reducing serving latency while preserving signal.
Surface-specific tower trees. Each surface has its own tower tree + late-fusion + surface-specific modules. At serving time, each surface-specific tower tree handles only its surface's traffic — "avoiding unnecessary compute cost from modules that don't benefit other surfaces." HF + SR tower trees present at time of post; RP tower tree is future work.
View-type-specific calibration. HF and SR CTR are calibrated separately by a view-type-specific calibration layer — shared global calibration was suboptimal because it "implicitly mixes traffic distributions across surfaces."
Multi-task heads + surface-specific checkpoint exports. Each surface exports its own checkpoint from the unified training run so it can adopt the most appropriate architecture "while still benefiting from shared representation learning."

Serving efficiency work¶

The naive unified model increased latency ("merging feature maps and modules made the model larger"). Three efficiency levers restored the cost envelope:

DCNv2 projection layer as above.
Fused kernel embedding — fused kernels for embedding-lookup inference (latency) + TF32 for training throughput.
Request-level user-embedding broadcasting. Instead of repeating heavy user embedding lookups for every candidate/request in a batch, fetch embeddings once per unique user and broadcast them back to the original request layout. Model inputs/outputs unchanged. Trade-off: upper bound on unique users per batch — if exceeded the request can fail, so Pinterest uses a tested unique-user number to keep the system reliable.

Unification sequencing¶

Pinterest followed a staged unification by CUDA throughput:

HF + SR first — similar CUDA throughput characteristics.
RP later — substantially higher cost profile; unified only after throughput + efficiency work stabilised.

Three guiding principles: start simple (merge strongest existing components), iterate incrementally (surface-aware modeling only after baseline proves value), maintain operational safety (safe rollout + monitoring + fast rollback at every step).

Results¶

"Significant improvements on both online and offline metrics" across HF and SR (specific percentages not disclosed in the post; reference to Pinterest internal data, US, 2025).
Offline improvements validated by online A/B experiments.
Resolves the pre-unification pain triple: low iteration velocity, redundant training cost, high maintenance burden.

Relationship to Pinterest's prior ads-ranking work¶

MMoE. References Pinterest's prior post Multi-gate-Mixture-of-Experts (MMoE) model architecture and knowledge distillation in Ads Engagement modeling development — not yet ingested.
Long user sequence modeling. References Pinterest's prior post User Action Sequence Modeling for Pinterest Ads Engagement Modeling — not yet ingested.

Sibling — shopping conversion candidate generation¶

Pinterest Shopping Conversion Candidate Generation (sources/2026-04-27-pinterest-from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation) is the retrieval-stage sibling of this ranking-stage unified engagement model. Both run across Home Feed + Search + Related Pins, both use shared trunks with surface-awareness, and both use DCNv2 — but in different architectural roles:

	Pinterest Ads Engagement Model (this)	Shopping Conversion CG
Funnel stage	Ranking (L2)	Retrieval
Objective	Engagement (CTR)	Offsite conversion
Surface specialisation	Tower trees + per-surface checkpoint exports	Single model with surface-specific features inside
Multi-task structure	Multi-task heads for extensibility across surfaces	Unified single-head multi-task (patterns/unified-multi-task-over-multi-head) — 2025 refresh
DCNv2 role	Projection layer (compression after Transformer)	Parallel cross layer inside each tower (capacity expansion)

Reading the two sources together gives the fuller picture of Pinterest's ads-ML architectural primitives: shared trunk, surface awareness, DCNv2 (in two roles), Transformer user-history encoder, multi-task heads (in two configurations).

Caveats¶

No numerical deltas disclosed. The post is an architectural retrospective with qualitative wins — no A/B percentages, no latency percentiles, no per-request compute breakdown, no infra-cost numbers.
MMoE topology not described. Expert count, gate structure, distillation usage all undisclosed.
Long-sequence Transformer topology not described. Sequence length, attention heads, layer count, feature tokenisation not disclosed.
Embedded architecture diagram in the original post is not in the ingested markdown — the shape above is reconstructed from the text.
RP integration is future work at time of writing — the post is a live-journey retrospective, not a closed-project case study.
Request-level-broadcast failure mode — batches exceeding the tested-unique-user number can fail; the exact threshold is not disclosed.

Seen in¶

2026-03-03 Pinterest — Unifying Ads Engagement Modeling Across Pinterest Surfaces (sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces) — canonical ingest; names the full architecture including MMoE + long-sequence Transformer trunk, DCNv2 projection layer, surface-specific tower trees, view-type-specific calibration, multi-task heads, surface-specific checkpoint exports, fused-kernel embedding, TF32, request-level user-embedding broadcasting.

companies/pinterest
systems/pinterest-shopping-conversion-cg — retrieval-stage sibling with different objective (conversion) and different DCNv2 role (parallel cross).
systems/pinterest-home-feed · systems/pinterest-search · systems/pinterest-related-pins
systems/dcnv2 · systems/transformer
concepts/surface-specific-calibration · concepts/multi-task-learning · concepts/long-user-sequence-modeling · concepts/mixture-of-experts · concepts/projection-layer-for-latency · concepts/request-level-embedding-broadcast · concepts/cuda-throughput-budget
patterns/unified-multi-surface-model · patterns/surface-specific-tower-tree · patterns/surface-specific-checkpoint-export · patterns/request-level-user-embedding-broadcast · patterns/staged-model-unification
concepts/multi-task-multi-label-ranking — related MTML framing (Meta Friend Bubbles uses MTML across tasks within one surface; Pinterest uses it across surfaces).