CONCEPT Cited by 3 sources
Flamegraph profiling¶
Definition¶
Flamegraph profiling is the practice of sampling a running process's stack at high frequency and rendering the aggregated stacks as a flamegraph — a horizontal-axis-is-samples, vertical-axis-is-stack-depth visualisation where the width of each frame is proportional to the time spent with that frame on the stack. Invented / popularised by Brendan Gregg.
The point is not the image per se but the rank-ordering of where CPU goes. Bugs of the "something is burning a core but we don't know what" shape — canonically CPU busy-loop incidents — are diagnosed almost entirely by reading the top of the flamegraph.
Async-state-machine signature¶
In async-Rust / Tokio stacks, the tell-tale signature of a spurious-wakeup busy-loop is that the flamegraph is dominated by infrastructure, not business logic:
tracing::Subscriber::enter/exitframes (span enter/exit is supposed to be very fast)- Tokio
pollframes with no meaningful work beneath them - libc syscalls that return almost immediately without doing I/O
As Fly.io describes:
"If the mere act of
enteringa span in a Tokio stack is chewing up a significant amount of CPU, something has gone haywire: the actual code being traced must be doing next to nothing."
The inversion — infrastructure in the hot path, business logic invisible — is the fingerprint.
Using the type signature¶
Modern languages with strong generic monomorphisation (Rust, Scala, templated C++) emit flamegraph frames with the fully-qualified type of each stack frame. For async Rust that often means the whole nested-Future type shows up as a single frame. Fly.io's 2025-02 case:
&mut fp_io::copy::Duplex<&mut fp_io::reusable_reader::ReusableReader<
fp_tcp::peek::PeekableReader<
tokio_rustls::server::TlsStream<
fp_tcp_metered::MeteredIo<
fp_tcp::peek::PeekableReader<
fp_tcp::permitted::PermittedTcpStream>>>>>,
connect::conn::Conn<tokio::net::tcp::stream::TcpStream>>
Reading this top-to-bottom gives the exact wrapper chain
around the bug — and since Fly's own wrappers (Duplex,
ReusableReader, PeekableReader, MeteredIo, PermittedTcpStream)
could be audited for recent changes + reproducibility, the
suspect narrowed to one foreign layer:
tokio-rustls::TlsStream.
Second use: rank-ordering optimization targets¶
Beyond the "something is on fire" use case, flamegraphs are the primary instrument for picking the right thing to optimize. On a large service, thousands of functions run — only a handful cost enough CPU to justify engineering time. The flamegraph's width ordering makes this pick deterministic: start at the top.
Netflix Ranker's 2026-03 video- serendipity-scoring optimization started with exactly this step:
"When we looked at CPU profiles for this service, one feature kept standing out: video serendipity scoring — the logic that answers a simple question: 'How different is this new title from what you've been watching so far?' This single feature was consuming about 7.5% of total CPU on each node running the service."
"A flamegraph made it clear: One of the top hotspots in the service was Java dot products inside the serendipity encoder. Algorithmically, the hotspot was a nested loop structure of M candidates × N history items where each pair generates its own cosine similarity — i.e. O(M×N) separate dot product operations."
(Source: sources/2026-03-03-netflix-optimizing-recommendation-systems-with-jdks-vector-api)
The flamegraph answered two questions:
- Which function to optimize (
serendipity encoderat 7.5% of CPU, not some other candidate). - Which operation inside that function dominated (dot products in a nested loop — a structural hint that told Netflix to reshape the computation rather than improve the inner loop).
This is the canonical wiki instance of the flamegraph as the target-selection instrument at the start of a measurement- driven optimization loop, distinct from the Fly.io case where the flamegraph diagnosed a bug (infrastructure-in-hot-path fingerprint of spurious-wakeup busy-loop).
Seen in¶
- sources/2025-02-26-flyio-taming-a-voracious-rust-proxy —
Pavel on Fly.io's proxy team pulled a flamegraph from an
angry
fly-proxy;tracing::Subscriberdominance was the "something is wrong" indicator; theFuturetype signature pointed attokio_rustls::server::TlsStreamas the guilty layer. Textbook flamegraph-as-diagnostic. - sources/2026-03-03-netflix-optimizing-recommendation-systems-with-jdks-vector-api — Flamegraph-as-target-selector. Netflix Performance Engineering used CPU profiles on Ranker to identify video- serendipity scoring at 7.5% of total node CPU and further localised the cost to Java dot products in a nested M×N loop structure — which led to the batched-matmul reshape and JDK Vector API kernel swap. Canonical wiki instance of flamegraph- drove target selection at the start of the measurement loop.
-
sources/2026-04-21-vercel-making-turborepo-96-faster-with-agents-sandboxes-and-humans — Flamegraph-as-agent-consumption-format-problem. Anthony Shew's 2026-04-21 Turborepo performance retrospective canonicalises the format-for-agent-consumption axis of profile-driven optimisation: the same underlying flame-graph span data in Chrome Trace Event Format JSON (Perfetto-loadable, UI-optimised) vs a companion Markdown version (line-per-record, grep- friendly, agent-optimised) produces "radically better optimization suggestions" from the same model + agent harness. Opens the agent-reader altitude of flamegraph consumption alongside the prior human-reader altitudes (Fly.io 2025-02 diagnostic, Netflix 2026-03 target selection). See patterns/markdown-profile-output-for-agents for the companion-emit pattern.
-
sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology — hyperscale producer altitude. Meta's Strobelight is the canonical fleet- orchestrated flame-graph substrate: 42+ profilers running on every production host feed flame-graph-ready data into Scuba within seconds of capture, with frame pointers enabled fleet-wide and delayed symbolization via a central service. First canonical wiki instance of flame-graph profiling at hyperscaler fleet scale (tens-of-thousands of services, every host always-on) — contrast with the Fly.io / Cloudflare / Netflix / Vercel instances which are per-service or per-team. Extended by Stack Schemas (query-time tagging) + Strobemeta (sample-time request-context attach) so a flame-graph can be filtered to "p99 requests on endpoint X" without post-hoc trace joins.
Related¶
- concepts/cpu-busy-loop-incident — the class of incident flamegraphs shine on.
- concepts/spurious-wakeup-busy-loop — the async-specific sub-pathology with a distinctive flamegraph signature.
- patterns/flamegraph-to-upstream-fix — the end-to-end pattern Fly.io executed.
- patterns/measurement-driven-micro-optimization — the optimization loop in which flamegraphs select the target function.
- systems/fly-proxy — the service the flamegraph was taken from.
- systems/netflix-ranker — the service where Netflix used flamegraphs to locate the 7.5% serendipity-scoring hot path.
- companies/flyio
- companies/netflix