SYSTEM Cited by 1 source

Pinterest Foundation Model¶

Definition¶

The Pinterest Foundation Model is Pinterest's large-scale Transformer-based model for recommendation, published at ACM RecSys 2025 as an oral spotlight (arXiv 2507.12704). Relative to prior Pinterest ranking models it achieves a 100× increase in transformer dense parameter count and a 10× increase in model dimension — "translating directly into meaningful quality improvements across multiple recommendation surfaces" (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication).

Role in the recsys stack¶

The Foundation Model is a sequential user-understanding component consuming "approximately 16K tokens encoding all actions a user has taken on the platform" — a massive user-history sequence that is shared across every candidate Pin scored in a request. It operates alongside TransAct as Pinterest's primary long-sequence user-behaviour encoder, and feeds downstream ranking layers across multiple surfaces (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication).

Why it motivated request-level deduplication¶

A 100× parameter / 10× dimension scaleup puts outsized infrastructure pressure on storage, training, and serving. Without explicit deduplication, the ~16K-token user sequence is duplicated "hundreds to thousands of copies per request" — once per candidate scored. Pinterest's canonical framing:

"A 100x scaleup creates massive infrastructure pressure. Storage, training, and serving costs all threaten to grow proportionally unless you're deliberate about efficiency. The single highest-impact technique we've deployed to hold costs in check across all three dimensions is request-level deduplication."

The Foundation Model is therefore the scaling-driver that forced request-level deduplication to be canonicalised as a cross-cutting discipline — storage (Iceberg sort-order + columnar compression), training (SyncBatchNorm + user-level masking + deferred re-expansion), and serving (DCAT).

Serving architecture — DCAT¶

The Foundation Model's ranking-serving path uses DCAT — Deduplicated Cross-Attention Transformer, described in the Foundation Model paper. DCAT splits the transformer into:

Context pass — apply the transformer to the user's action sequence once per deduplicated request. Cache the Keys and Values from each layer.
Crossing pass — each candidate item performs cross-attention with the cached user-history KV, reusing the deduplicated context computation.

Implemented with custom Triton kernels for training and serving. "Achieved significant throughput gains over standard self-attention with FlashAttention" (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication).

Production impact¶

Pinterest's combined request-level deduplication programme (serving DCAT + training + storage, US 2025, citation "²"):

Storage: 10–50× compression on user-heavy feature columns via Iceberg + request-ID sorting.
Training: 4× end-to-end speedup for retrieval; ~2.8× speedup for ranking (40% from deduplicated data loading × 2× from DCAT cross-attention).
Serving: 7× increase in ranking serving throughput — "what made it possible to deploy a 100× larger model without proportional serving cost increases, absorbing the full Foundation Model scaleup while holding infrastructure budgets in check."

Caveats¶

Architectural details come from the arXiv paper, not the 2026-04-13 post. Layer count, attention-head shape, tokenisation of user actions, training-objective specifics, sequence-handling of long-tail users all live in the paper rather than the post ingested here.
Pinterest doesn't disclose absolute parameter count, absolute model-dim value, fleet size, GPU type, training-cluster size, or per-surface downstream consumption.
Not a foundation model in the LLM sense — it's a recsys Transformer, despite the name. No generative capability disclosed.
Relationship to Pinterest Ads Engagement Model not fully disclosed — the ads model uses a long-user-sequence Transformer + MMoE trunk; whether the ads model is the Foundation Model, a variant, or a separate architecture trained on similar principles is not stated.

Seen in¶

2026-04-13 Pinterest — Scaling Recommendation Systems with Request-Level Deduplication (sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication) — canonical wiki system page. Framing as the 100×-parameter / 10×-dimension scaling driver that motivated request-level deduplication across storage + training + serving; DCAT as the Foundation Model's ranking-serving architecture.

systems/pinterest-dcat — the ranking-serving architecture for this model.
systems/pinterest-transact — sibling long-sequence user-modeling Transformer.
systems/pinterest-ads-engagement-model — sibling ads-domain long-sequence Transformer.
concepts/request-level-deduplication — the discipline this model's scale-up forced.
concepts/long-user-sequence-modeling — the ~16K-token user-sequence shape.
systems/flash-attention / systems/triton-lang — the kernel substrate DCAT displaces / uses.
companies/pinterest