Skip to content

SYSTEM Cited by 1 source

NCU (NVIDIA Nsight Compute)

Definition

NCU (Nsight Compute) is NVIDIA's GPU kernel-level profiler — it reports per-kernel hardware metrics: occupancy, memory throughput, instruction mix, warp stall reasons, L1/L2 cache behavior, SM utilization, and more. Inside Meta's KernelEvolve it is one layer of a multi-tool evaluation framework that produces structured diagnostic signal (not just scalar wall-clock time) for the agent's next round of candidate generation (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure).

Role in KernelEvolve

NCU is used to answer "why is this candidate slow?" — is it memory-bound, compute-bound, or occupancy-limited? — and the answer is fed back to the LLM synthesizer as part of its context-aware prompt for the next round of tree-search node expansions:

"The search engine doesn't just see 'kernel A is 1.2x faster than kernel B' — it sees why: whether the bottleneck is memory-bound, compute-bound, or limited by occupancy — and feeds that diagnostic signal back into the LLM synthesizer to guide the next round of candidates."

Canonical wiki datum for why NVIDIA's kernel-level profiler is a load-bearing input, not an optional debugging tool, at the agentic-kernel-optimization layer of the AI stack.

Seen in

Caveats

The 2026-04-02 post does not document which specific NCU metrics KernelEvolve reads (NCU has hundreds). Public NCU documentation is at developer.nvidia.com/nsight-compute.

Last updated · 550 distilled / 1,221 read