Skip to content

SYSTEM Cited by 2 sources

Triton (language)

Triton is an open-source Python-embedded DSL (originally from OpenAI / Philippe Tillet) for authoring GPU kernels at a higher level of abstraction than CUDA/HIP — expressing tiled computations with auto-generated memory coalescing, shared-memory management, and tensor-core scheduling.

This page is the canonical wiki entry under the slug triton-lang that source pages reference as [systems/triton-lang](<./triton-lang.md>). For the language-and-extensions detail page (including Meta's TLX fork), see systems/triton-dsl.

Use at Pinterest — DCAT kernels

Pinterest's DCAT (Deduplicated Cross-Attention Transformer) is "implemented with custom Triton kernels for both training and serving" (Source: sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication). The custom kernels displace FlashAttention for ranking attention — DCAT's two-phase context/crossing split is not expressible as a stock FlashAttention call, so Pinterest wrote Triton kernels implementing the context pass (populate KV) + crossing pass (cross-attention against cached KV) directly.

Canonical wiki instance: Triton as the substrate for specialised attention architectures that diverge from standard self-attention (beyond what stock fused-attention kernels provide). Complements Meta's KernelEvolve use of Triton + TLX as the emission targets for LLM-generated kernels.

Caveats

  • Kernel-level detail not disclosed — Pinterest names Triton as the implementation language but does not disclose tile sizes, shared-memory layouts, or the specific kernel shape for context vs crossing.
  • Triton vs TLX vs CUDA vs HIP — the 2026-04-13 Pinterest post names only Triton. Whether Pinterest uses upstream Triton, a fork, or bridges to lower-level CUDA is not disclosed.

Seen in

Last updated · 550 distilled / 1,221 read