SYSTEM Cited by 2 sources

Triton (language)¶

Triton is an open-source Python-embedded DSL (originally from OpenAI / Philippe Tillet) for authoring GPU kernels at a higher level of abstraction than CUDA/HIP — expressing tiled computations with auto-generated memory coalescing, shared-memory management, and tensor-core scheduling.

This page is the canonical wiki entry under the slug triton-lang that source pages reference as [systems/triton-lang](<./triton-lang.md>). For the language-and-extensions detail page (including Meta's TLX fork), see systems/triton-dsl.

Use at Pinterest — DCAT kernels¶

Pinterest's DCAT (Deduplicated Cross-Attention Transformer) is "implemented with custom Triton kernels for both training and serving" (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication). The custom kernels displace FlashAttention for ranking attention — DCAT's two-phase context/crossing split is not expressible as a stock FlashAttention call, so Pinterest wrote Triton kernels implementing the context pass (populate KV) + crossing pass (cross-attention against cached KV) directly.

Canonical wiki instance: Triton as the substrate for specialised attention architectures that diverge from standard self-attention (beyond what stock fused-attention kernels provide). Complements Meta's KernelEvolve use of Triton + TLX as the emission targets for LLM-generated kernels.

Caveats¶

Kernel-level detail not disclosed — Pinterest names Triton as the implementation language but does not disclose tile sizes, shared-memory layouts, or the specific kernel shape for context vs crossing.
Triton vs TLX vs CUDA vs HIP — the 2026-04-13 Pinterest post names only Triton. Whether Pinterest uses upstream Triton, a fork, or bridges to lower-level CUDA is not disclosed.

Seen in¶

2026-04-13 Pinterest — Scaling Recommendation Systems with Request-Level Deduplication (sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication) — Triton as the DSL for DCAT's custom attention kernels.
2026-04-02 Meta — KernelEvolve (sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure) — Triton + TLX as LLM-synthesiser emission targets.

systems/triton-dsl — longer-form entry covering Meta's TLX fork.
systems/tritonbench — Meta's benchmark harness for Triton kernels.
systems/pinterest-dcat — Pinterest's custom Triton attention kernels.
systems/flash-attention — the standard attention implementation DCAT's Triton kernels displace.
systems/kernelevolve — Meta's automated Triton kernel synthesiser.

Triton (language)¶

Use at Pinterest — DCAT kernels¶

Caveats¶

Seen in¶

Related¶