SYSTEM Cited by 1 source
Transformer (architecture)¶
Definition¶
Transformer is the neural network architecture introduced by Vaswani et al. (Attention Is All You Need, 2017, arXiv:1706.03762) based on stacked self-attention + feed-forward layers. The Transformer is the load-bearing architectural primitive under LLMs, modern video/audio encoders (MediaFM, wav2vec2), and long-user-sequence modeling in recsys/ads ranking.
This page is a minimal wiki stub; the canonical architecture is described extensively elsewhere. Pages on the wiki use "Transformer" in several distinct contexts — LLM serving, multimodal encoders, sequence encoders in ranking — each with its own operational profile.
Use at Pinterest — long user sequence modeling¶
Pinterest's unified ads engagement model uses a Transformer over long user sequences as one component of the shared trunk (long-user-sequence modeling). The Transformer's outputs feed into a DCNv2 projection layer, then into downstream feature crossing + surface-specific tower trees.
The Pinterest post treats the long-sequence Transformer as a cost-heavy encoder whose outputs must be projected (via DCNv2) before downstream work, confirming the common ads-ranking pattern: Transformer-as-feature-encoder, not Transformer-as-ranker (Source: sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces).
The Pinterest post notes that long-sequence Transformers did not produce consistent gains when applied in isolation on one surface — they paid off only when integrated into a unified model trained on multi-surface combined features, where the broader feature distribution gave the Transformer enough signal diversity to clear its cost bar.
Caveats¶
- Stub — the canonical Transformer architecture description is not fully documented here; the wiki references the architecture across many pages (MediaFM, Airbnb Destination Recommendation, various LLM-serving pages) each with context-specific detail.
- Topology is context-specific. Layer count, head count, hidden dim, sequence length depend on use case — Pinterest doesn't disclose the long-user-sequence Transformer's topology in the 2026-03-03 post.
- Pinterest's long-sequence variant is the subject of a prior Pinterest blog post (User Action Sequence Modeling for Pinterest Ads Engagement Modeling) not ingested on the wiki.
Seen in¶
- 2026-03-03 Pinterest — Unifying Ads Engagement Modeling (sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces) — long-sequence Transformer as ads-ranking feature encoder, output projected via DCNv2 before downstream crossing.
- 2026-02-23 Netflix — MediaFM (sources/2026-02-23-netflix-mediafm-the-multimodal-ai-foundation-for-media-understanding) — BERT-style Transformer encoder over sequences of shots for multimodal media understanding.
- Many other wiki pages reference Transformer as a building block (LLM serving, multimodal fusion, sequence modeling).