Skip to content

SYSTEM Cited by 1 source

Proton (Intra-Kernel GPU Profiler)

Definition

Proton (IEEE document 11395207) is a GPU profiler that "delivers intra-kernel instruction-level latency and pipeline behavior" — a finer-grained signal than NCU's kernel-summary metrics. Named by Meta as one of the profiling tools composed into KernelEvolve's evaluation framework (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure).

Role in KernelEvolve

Proton sits below NCU in KernelEvolve's profiling hierarchy:

  • PyTorch Profiler — system-level execution timelines (kernel-launch overhead, host-device synchronization).
  • TritonBench — numerical correctness + end-to-end speedup.
  • NCU — per-kernel hardware summary metrics (occupancy, memory throughput, instruction mix).
  • Proton — intra-kernel instruction-level latency + pipeline behavior.

The finer-grained signal matters because KernelEvolve's LLM synthesizer needs to know not just "this kernel is memory-bound" but "these specific instructions stall the pipeline" — proton-level detail is what lets the next round of search-tree node expansions target the right transformation.

Seen in

Caveats

The 2026-04-02 post links to IEEE Xplore document 11395207 as the Proton reference. The paper itself is not ingested on this wiki. Proton's relationship to other open-source intra-kernel profilers (Nsight Systems, rocprof, CUPTI) is not characterized in the post.

Last updated · 550 distilled / 1,221 read