SYSTEM Cited by 2 sources
Linux perf¶
perf (aka perf_events, aka linux-tools-perf) is the
canonical Linux sampling profiler. It lives in the kernel tree
(tools/perf/) and samples the program counter + stack trace
across all CPUs at a configurable frequency, writing results to
a perf.data file consumable by downstream tooling (flamegraph
scripts, Flamescope, perf report, etc.).
Key verbs¶
perf record -F <Hz> -g -a -o <file> -- <duration-cmd>— record all-CPU stack traces at<Hz>Hz for the duration of<cmd>(commonlysleep 120).-genables call-graph capture;-aenables all-CPU sampling (otherwise scoped to the specified PID).perf script --header -i <file>— renderperf.datainto a text stack-trace log consumable by the standard flamegraph tooling.perf report— top-style interactive view of the recorded data.
Production capture patterns¶
One-shot capture for reproducible incidents¶
Standard flamegraph workflow: kick off perf record while the
incident is reproducing, capture for 30-60 s, stop, convert to
flamegraph. Works when the incident is continuous.
Temporal / continuous capture for rare events¶
When the incident is rare and unpredictable (e.g.
Pinterest's ENA resets that fired at sporadic intervals over
8-12 h training jobs), one-shot perf record has almost no
chance of coinciding with the event. The fix is to run
perf record in a loop for hours, tag each perf.data with
a timestamp, and time-travel to the relevant window after
the fact:
for i in {1..360}; do
sudo perf record -F 97 -g -a \
-o perf-$(hostname)-$(date +"%Y%m%d-%H-%M-%S")-120s.data \
-- sleep 120
done
This is the patterns/continuous-perf-record-for-time-travel pattern; Flamescope is the downstream visualiser.
Why -F 97?¶
97 Hz is a commonly-chosen odd prime-ish sampling frequency (Pinterest's incident + Brendan Gregg's canonical recipes use this value). Avoids harmonics with periodic kernel / scheduler activity at round-number frequencies (100, 250, 1000 Hz).
Seen in¶
- sources/2026-04-15-pinterest-finding-zombies-in-our-systems-cpu-bottlenecks
— Pinterest's PinCompute + ML Platform teams used
perf record -F 97 -g -ain a 2-minute × 360-iteration bash loop on tainted reserved K8s hosts to collect 12 hours of temporally-tagged profile data. Running alongside synthetic hyper-parameter-tuning training jobs that guaranteed a reset would fire, they then loaded the relevantperf.datafile into Flamescope and zoomed in to the 5 s around a kernel-log-timestamped ENA reset — which revealedkubelet'smem_cgroup_nr_lru_pagesburn. - sources/2025-07-29-netflix-linux-performance-analysis-in-60-seconds
— Brendan Gregg's 60-second Linux performance-analysis
checklist positions
perfas the follow-on tool after the stock-utility triage narrows the problem.
Related¶
- systems/flamescope — temporal-flamegraph visualiser for
perf.dataoutput - systems/mpstat — per-core coarse-grained CPU observability
tool that often narrows the investigation to a specific core
before
perfis engaged - concepts/flamegraph-profiling
- concepts/temporal-profiling
- concepts/use-method — Brendan Gregg's triage methodology
- patterns/continuous-perf-record-for-time-travel