SYSTEM Cited by 1 source

CuTe DSL (CUTLASS Tensor DSL)¶

Definition¶

CuTe is NVIDIA's tensor-layout abstraction + DSL that underpins the CUTLASS CUDA template library. It provides a unified algebra over tensors, layouts, and hardware-aware tiling so kernel authors can compose GEMM / convolution / attention kernels against NVIDIA tensor cores without hand-writing all the shared-memory choreography. One of the high-level GPU DSLs Meta's KernelEvolve LLM synthesizer emits kernels in (alongside Triton, TLX, and FlyDSL) (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure).

Role in KernelEvolve¶

CuTe DSL is one of the emission targets KernelEvolve's LLM synthesizer can pick when the target hardware is an NVIDIA GPU. It sits at a higher abstraction than raw CUDA (which KernelEvolve also emits) — the synthesizer picks the DSL based on operator shape, hardware target, and prior-candidate history recorded in the tree-search engine's node memory.

Seen in¶

Meta KernelEvolve (2026-04-02, canonical). Named alongside Triton + TLX + FlyDSL as a KernelEvolve-supported high-level DSL. (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure)

Caveats¶

Public CuTe / CUTLASS documentation is extensive on github.com/NVIDIA/cutlass; the 2026-04-02 KernelEvolve post does not describe the DSL itself, only names it as one of the emission targets.

companies/meta — this wiki page is scoped to CuTe's appearance in the KernelEvolve post.
systems/kernelevolve — the agentic kernel synthesizer that emits CuTe code.
systems/triton-dsl — a sibling high-level GPU DSL KernelEvolve also targets.
systems/nvidia-h100 — the NVIDIA hardware family that CuTe abstracts tensor-core primitives for.

CuTe DSL (CUTLASS Tensor DSL)¶

Definition¶

Role in KernelEvolve¶

Seen in¶

Caveats¶

Related¶