Skip to content

SYSTEM Cited by 1 source

Pinterest Foundation Model

Definition

The Pinterest Foundation Model is Pinterest's large-scale Transformer-based model for recommendation, published at ACM RecSys 2025 as an oral spotlight (arXiv 2507.12704). Relative to prior Pinterest ranking models it achieves a 100× increase in transformer dense parameter count and a 10× increase in model dimension"translating directly into meaningful quality improvements across multiple recommendation surfaces" (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication).

Role in the recsys stack

The Foundation Model is a sequential user-understanding component consuming "approximately 16K tokens encoding all actions a user has taken on the platform" — a massive user-history sequence that is shared across every candidate Pin scored in a request. It operates alongside TransAct as Pinterest's primary long-sequence user-behaviour encoder, and feeds downstream ranking layers across multiple surfaces (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication).

Why it motivated request-level deduplication

A 100× parameter / 10× dimension scaleup puts outsized infrastructure pressure on storage, training, and serving. Without explicit deduplication, the ~16K-token user sequence is duplicated "hundreds to thousands of copies per request" — once per candidate scored. Pinterest's canonical framing:

"A 100x scaleup creates massive infrastructure pressure. Storage, training, and serving costs all threaten to grow proportionally unless you're deliberate about efficiency. The single highest-impact technique we've deployed to hold costs in check across all three dimensions is request-level deduplication."

The Foundation Model is therefore the scaling-driver that forced request-level deduplication to be canonicalised as a cross-cutting discipline — storage (Iceberg sort-order + columnar compression), training (SyncBatchNorm + user-level masking + deferred re-expansion), and serving (DCAT).

Serving architecture — DCAT

The Foundation Model's ranking-serving path uses DCAT — Deduplicated Cross-Attention Transformer, described in the Foundation Model paper. DCAT splits the transformer into:

  1. Context pass — apply the transformer to the user's action sequence once per deduplicated request. Cache the Keys and Values from each layer.
  2. Crossing pass — each candidate item performs cross-attention with the cached user-history KV, reusing the deduplicated context computation.

Implemented with custom Triton kernels for training and serving. "Achieved significant throughput gains over standard self-attention with FlashAttention" (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication).

Production impact

Pinterest's combined request-level deduplication programme (serving DCAT + training + storage, US 2025, citation "²"):

  • Storage: 10–50× compression on user-heavy feature columns via Iceberg + request-ID sorting.
  • Training: 4× end-to-end speedup for retrieval; ~2.8× speedup for ranking (40% from deduplicated data loading × 2× from DCAT cross-attention).
  • Serving: 7× increase in ranking serving throughput"what made it possible to deploy a 100× larger model without proportional serving cost increases, absorbing the full Foundation Model scaleup while holding infrastructure budgets in check."

Caveats

  • Architectural details come from the arXiv paper, not the 2026-04-13 post. Layer count, attention-head shape, tokenisation of user actions, training-objective specifics, sequence-handling of long-tail users all live in the paper rather than the post ingested here.
  • Pinterest doesn't disclose absolute parameter count, absolute model-dim value, fleet size, GPU type, training-cluster size, or per-surface downstream consumption.
  • Not a foundation model in the LLM sense — it's a recsys Transformer, despite the name. No generative capability disclosed.
  • Relationship to Pinterest Ads Engagement Model not fully disclosed — the ads model uses a long-user-sequence Transformer + MMoE trunk; whether the ads model is the Foundation Model, a variant, or a separate architecture trained on similar principles is not stated.

Seen in

Last updated · 550 distilled / 1,221 read