SYSTEM Cited by 1 source
Proton (Intra-Kernel GPU Profiler)¶
Definition¶
Proton (IEEE document 11395207) is a GPU profiler that "delivers intra-kernel instruction-level latency and pipeline behavior" — a finer-grained signal than NCU's kernel-summary metrics. Named by Meta as one of the profiling tools composed into KernelEvolve's evaluation framework (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure).
Role in KernelEvolve¶
Proton sits below NCU in KernelEvolve's profiling hierarchy:
- PyTorch Profiler — system-level execution timelines (kernel-launch overhead, host-device synchronization).
- TritonBench — numerical correctness + end-to-end speedup.
- NCU — per-kernel hardware summary metrics (occupancy, memory throughput, instruction mix).
- Proton — intra-kernel instruction-level latency + pipeline behavior.
The finer-grained signal matters because KernelEvolve's LLM synthesizer needs to know not just "this kernel is memory-bound" but "these specific instructions stall the pipeline" — proton-level detail is what lets the next round of search-tree node expansions target the right transformation.
Seen in¶
- Meta KernelEvolve (2026-04-02, canonical). Named as the intra-kernel profiler complementing NCU in KernelEvolve's evaluation stack. (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure)
Caveats¶
The 2026-04-02 post links to IEEE Xplore document 11395207 as the Proton reference. The paper itself is not ingested on this wiki. Proton's relationship to other open-source intra-kernel profilers (Nsight Systems, rocprof, CUPTI) is not characterized in the post.
Related¶
- companies/meta — the deployment context.
- systems/kernelevolve — the agentic kernel-synthesis system that consumes Proton output.
- systems/nvidia-ncu — the per-kernel-summary profiler Proton complements.
- patterns/evaluation-harness-in-agent-loop — the pattern Proton is one instrument of.