Skip to content

SYSTEM Cited by 1 source

Transformer (architecture)

Definition

Transformer is the neural network architecture introduced by Vaswani et al. (Attention Is All You Need, 2017, arXiv:1706.03762) based on stacked self-attention + feed-forward layers. The Transformer is the load-bearing architectural primitive under LLMs, modern video/audio encoders (MediaFM, wav2vec2), and long-user-sequence modeling in recsys/ads ranking.

This page is a minimal wiki stub; the canonical architecture is described extensively elsewhere. Pages on the wiki use "Transformer" in several distinct contexts — LLM serving, multimodal encoders, sequence encoders in ranking — each with its own operational profile.

Use at Pinterest — long user sequence modeling

Pinterest's unified ads engagement model uses a Transformer over long user sequences as one component of the shared trunk (long-user-sequence modeling). The Transformer's outputs feed into a DCNv2 projection layer, then into downstream feature crossing + surface-specific tower trees.

The Pinterest post treats the long-sequence Transformer as a cost-heavy encoder whose outputs must be projected (via DCNv2) before downstream work, confirming the common ads-ranking pattern: Transformer-as-feature-encoder, not Transformer-as-ranker (Source: sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces).

The Pinterest post notes that long-sequence Transformers did not produce consistent gains when applied in isolation on one surface — they paid off only when integrated into a unified model trained on multi-surface combined features, where the broader feature distribution gave the Transformer enough signal diversity to clear its cost bar.

Caveats

  • Stub — the canonical Transformer architecture description is not fully documented here; the wiki references the architecture across many pages (MediaFM, Airbnb Destination Recommendation, various LLM-serving pages) each with context-specific detail.
  • Topology is context-specific. Layer count, head count, hidden dim, sequence length depend on use case — Pinterest doesn't disclose the long-user-sequence Transformer's topology in the 2026-03-03 post.
  • Pinterest's long-sequence variant is the subject of a prior Pinterest blog post (User Action Sequence Modeling for Pinterest Ads Engagement Modeling) not ingested on the wiki.

Seen in

Last updated · 319 distilled / 1,201 read