Skip to content

CONCEPT Cited by 1 source

Stack-trace sampling profiling

Stack-trace sampling profiling is a production profiling technique in which a profiler periodically (e.g., 100 Hz) takes a snapshot of the call stack on each running thread, then estimates per-function CPU utilization as:

function CPU % ≈ (samples containing the function) / (total samples)

Why it works

If a function F is on 19 out of 1,111 sampled stacks, then the process was executing somewhere inside F ~1.71 % of the time — a direct estimate of F's CPU share. The estimate converges to the true share as sample count grows.

Why it matters at scale

At CDN scales:

  • systems/pingora-origin: 40,000 saturated CPU cores globally. 1 % of CPU = 400 cores. Helper functions contributing 1-2 % become worthwhile optimization targets.
  • Without stack-trace sampling you'd never flag a one-line header-cleanup helper as worth rewriting. With it, the target surfaces itself.

Operational properties

  • Very low overhead (eBPF / perf / linux-perf / Datadog Continuous Profiler) — a few kHz of sampling is negligible.
  • Production-representative — works against real traffic patterns, not synthetic benches.
  • No code changes needed — no manual span / trace instrumentation; the profiler hooks the kernel scheduler or signal / eBPF primitives.
  • Statistical — noisy in the short run; trust the converged numbers over minutes-to-hours.

Closing the loop with microbenchmarks

The canonical Cloudflare pattern is flame-graph → criterion microbench → production-sampling verification:

  1. Stack-trace sampling identifies the function worth optimizing (CPU share above the threshold).
  2. Criterion microbench measures the candidate fixes in isolation at nanosecond resolution.
  3. Predicted CPU % is extrapolated linearly from microbench timings.
  4. Ship the winner; re-sample production; verify predicted vs measured match. If they do, the methodology is trustworthy for the next optimization.

The 2024-09-10 trie-hard rollout matched predicted CPU (0.43 %) against measured CPU (0.34 %) within ~0.1 % — tight enough to trust criterion as the decision substrate for the next helper (sources/2024-09-10-cloudflare-a-good-day-to-trie-hard).

Last updated · 200 distilled / 1,178 read