SYSTEM Cited by 1 source
CuTe DSL (CUTLASS Tensor DSL)¶
Definition¶
CuTe is NVIDIA's tensor-layout abstraction + DSL that underpins the CUTLASS CUDA template library. It provides a unified algebra over tensors, layouts, and hardware-aware tiling so kernel authors can compose GEMM / convolution / attention kernels against NVIDIA tensor cores without hand-writing all the shared-memory choreography. One of the high-level GPU DSLs Meta's KernelEvolve LLM synthesizer emits kernels in (alongside Triton, TLX, and FlyDSL) (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure).
Role in KernelEvolve¶
CuTe DSL is one of the emission targets KernelEvolve's LLM synthesizer can pick when the target hardware is an NVIDIA GPU. It sits at a higher abstraction than raw CUDA (which KernelEvolve also emits) — the synthesizer picks the DSL based on operator shape, hardware target, and prior-candidate history recorded in the tree-search engine's node memory.
Seen in¶
- Meta KernelEvolve (2026-04-02, canonical). Named alongside Triton + TLX + FlyDSL as a KernelEvolve-supported high-level DSL. (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure)
Caveats¶
Public CuTe / CUTLASS documentation is extensive on github.com/NVIDIA/cutlass; the 2026-04-02 KernelEvolve post does not describe the DSL itself, only names it as one of the emission targets.
Related¶
- companies/meta — this wiki page is scoped to CuTe's appearance in the KernelEvolve post.
- systems/kernelevolve — the agentic kernel synthesizer that emits CuTe code.
- systems/triton-dsl — a sibling high-level GPU DSL KernelEvolve also targets.
- systems/nvidia-h100 — the NVIDIA hardware family that CuTe abstracts tensor-core primitives for.