SYSTEM Cited by 1 source
NCU (NVIDIA Nsight Compute)¶
Definition¶
NCU (Nsight Compute) is NVIDIA's GPU kernel-level profiler — it reports per-kernel hardware metrics: occupancy, memory throughput, instruction mix, warp stall reasons, L1/L2 cache behavior, SM utilization, and more. Inside Meta's KernelEvolve it is one layer of a multi-tool evaluation framework that produces structured diagnostic signal (not just scalar wall-clock time) for the agent's next round of candidate generation (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure).
Role in KernelEvolve¶
NCU is used to answer "why is this candidate slow?" — is it memory-bound, compute-bound, or occupancy-limited? — and the answer is fed back to the LLM synthesizer as part of its context-aware prompt for the next round of tree-search node expansions:
"The search engine doesn't just see 'kernel A is 1.2x faster than kernel B' — it sees why: whether the bottleneck is memory-bound, compute-bound, or limited by occupancy — and feeds that diagnostic signal back into the LLM synthesizer to guide the next round of candidates."
Canonical wiki datum for why NVIDIA's kernel-level profiler is a load-bearing input, not an optional debugging tool, at the agentic-kernel-optimization layer of the AI stack.
Seen in¶
- Meta KernelEvolve (2026-04-02, canonical). Named as one of the GPU-side profilers in KernelEvolve's evaluation stack. (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure)
Caveats¶
The 2026-04-02 post does not document which specific NCU metrics KernelEvolve reads (NCU has hundreds). Public NCU documentation is at developer.nvidia.com/nsight-compute.
Related¶
- companies/meta — the KernelEvolve deployment context.
- systems/kernelevolve — the agentic kernel-synthesis system that consumes NCU output.
- systems/proton-profiler — the complementary intra-kernel (instruction-level latency) profiler.
- systems/tritonbench — the correctness + speedup layer; NCU is the hardware-metrics layer.
- patterns/evaluation-harness-in-agent-loop — the pattern NCU is one component of.