SYSTEM Cited by 2 sources
Triton (language)¶
Triton is an open-source Python-embedded DSL (originally from OpenAI / Philippe Tillet) for authoring GPU kernels at a higher level of abstraction than CUDA/HIP — expressing tiled computations with auto-generated memory coalescing, shared-memory management, and tensor-core scheduling.
This page is the canonical wiki entry under the slug triton-lang that source pages reference as [systems/triton-lang](<./triton-lang.md>). For the language-and-extensions detail page (including Meta's TLX fork), see systems/triton-dsl.
Use at Pinterest — DCAT kernels¶
Pinterest's DCAT (Deduplicated Cross-Attention Transformer) is "implemented with custom Triton kernels for both training and serving" (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication). The custom kernels displace FlashAttention for ranking attention — DCAT's two-phase context/crossing split is not expressible as a stock FlashAttention call, so Pinterest wrote Triton kernels implementing the context pass (populate KV) + crossing pass (cross-attention against cached KV) directly.
Canonical wiki instance: Triton as the substrate for specialised attention architectures that diverge from standard self-attention (beyond what stock fused-attention kernels provide). Complements Meta's KernelEvolve use of Triton + TLX as the emission targets for LLM-generated kernels.
Caveats¶
- Kernel-level detail not disclosed — Pinterest names Triton as the implementation language but does not disclose tile sizes, shared-memory layouts, or the specific kernel shape for context vs crossing.
- Triton vs TLX vs CUDA vs HIP — the 2026-04-13 Pinterest post names only Triton. Whether Pinterest uses upstream Triton, a fork, or bridges to lower-level CUDA is not disclosed.
Seen in¶
- 2026-04-13 Pinterest — Scaling Recommendation Systems with Request-Level Deduplication (sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication) — Triton as the DSL for DCAT's custom attention kernels.
- 2026-04-02 Meta — KernelEvolve (sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure) — Triton + TLX as LLM-synthesiser emission targets.
Related¶
- systems/triton-dsl — longer-form entry covering Meta's TLX fork.
- systems/tritonbench — Meta's benchmark harness for Triton kernels.
- systems/pinterest-dcat — Pinterest's custom Triton attention kernels.
- systems/flash-attention — the standard attention implementation DCAT's Triton kernels displace.
- systems/kernelevolve — Meta's automated Triton kernel synthesiser.