Skip to content

SYSTEM Cited by 1 source

CuTe DSL (CUTLASS Tensor DSL)

Definition

CuTe is NVIDIA's tensor-layout abstraction + DSL that underpins the CUTLASS CUDA template library. It provides a unified algebra over tensors, layouts, and hardware-aware tiling so kernel authors can compose GEMM / convolution / attention kernels against NVIDIA tensor cores without hand-writing all the shared-memory choreography. One of the high-level GPU DSLs Meta's KernelEvolve LLM synthesizer emits kernels in (alongside Triton, TLX, and FlyDSL) (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure).

Role in KernelEvolve

CuTe DSL is one of the emission targets KernelEvolve's LLM synthesizer can pick when the target hardware is an NVIDIA GPU. It sits at a higher abstraction than raw CUDA (which KernelEvolve also emits) — the synthesizer picks the DSL based on operator shape, hardware target, and prior-candidate history recorded in the tree-search engine's node memory.

Seen in

Caveats

Public CuTe / CUTLASS documentation is extensive on github.com/NVIDIA/cutlass; the 2026-04-02 KernelEvolve post does not describe the DSL itself, only names it as one of the emission targets.

Last updated · 550 distilled / 1,221 read