Skip to content

SYSTEM Cited by 1 source

Flamescope

Flamescope (github.com/Netflix/flamescope) is a Netflix-authored open-source visualisation tool for temporal CPU profiling data. It extends Brendan Gregg's flamegraph idea with a time-travel view — the x-axis of a Flamescope plot is wall-clock time, and the user can zoom into any subset of the time window to render a flamegraph for just that window.

This is the critical primitive for diagnosing sporadic CPU spikes where a one-shot flamegraph captures either too much (aggregated over an hour that includes the incident + the baseline) or too little (a 30-second perf sample that coincidentally misses the event).

Shape

Input: a perf.data / perf stack output spanning minutes to hours of wall-clock time.

Output:

  1. Temporal overview — a heatmap-style X-axis=time view where each column is a small slice (e.g. 1 s) and colour encodes sample density. Anomalous spikes stand out.
  2. Zoom selection — the user clicks and drags to select a sub-window; Flamescope renders a flamegraph for just that window.
  3. Flamegraph — classical Brendan Gregg flamegraph for the selected time window, with the hot path rank-ordered by CPU time.

Seen in

  • sources/2026-04-15-pinterest-finding-zombies-in-our-systems-cpu-bottleneckscanonical wiki instance of Flamescope's time-travel debugging use. Pinterest ran perf record in 2-minute increments over 12 hours on tainted reserved K8s hosts, loaded the resulting perf.data files into Flamescope, and zoomed into a 5-second window around a confirmed ENA-reset timestamp. The zoomed flamegraph revealed kubelet at 6.5% of total CPU (vs <1% baseline) spending nearly all of it in mem_cgroup_nr_lru_pages — the Rosetta stone of the investigation.

Why it matters for the Pinterest incident

Without Flamescope (or an equivalent temporal visualiser), the same perf.data files would either:

  • Be viewed as an aggregated flamegraph over the full 12 hours, where the 5-second kubelet spike gets diluted into the noise floor.
  • Or be viewed as a single perf script text dump, where ranking is possible but the temporal correlation with the ENA reset timestamp is invisible.

The time-travel view is what turns "we have 12 hours of perf data and one of those seconds is important" into a tractable debugging workflow.

Last updated · 319 distilled / 1,201 read