CONCEPT Cited by 1 source
Stack-trace sampling profiling¶
Stack-trace sampling profiling is a production profiling technique in which a profiler periodically (e.g., 100 Hz) takes a snapshot of the call stack on each running thread, then estimates per-function CPU utilization as:
function CPU % ≈ (samples containing the function) / (total samples)
Why it works¶
If a function F is on 19 out of 1,111 sampled stacks, then
the process was executing somewhere inside F ~1.71 % of the
time — a direct estimate of F's CPU share. The estimate
converges to the true share as sample count grows.
Why it matters at scale¶
At CDN scales:
- systems/pingora-origin: 40,000 saturated CPU cores globally. 1 % of CPU = 400 cores. Helper functions contributing 1-2 % become worthwhile optimization targets.
- Without stack-trace sampling you'd never flag a one-line header-cleanup helper as worth rewriting. With it, the target surfaces itself.
Operational properties¶
- Very low overhead (eBPF / perf / linux-perf / Datadog Continuous Profiler) — a few kHz of sampling is negligible.
- Production-representative — works against real traffic patterns, not synthetic benches.
- No code changes needed — no manual span / trace instrumentation; the profiler hooks the kernel scheduler or signal / eBPF primitives.
- Statistical — noisy in the short run; trust the converged numbers over minutes-to-hours.
Closing the loop with microbenchmarks¶
The canonical Cloudflare pattern is flame-graph → criterion microbench → production-sampling verification:
- Stack-trace sampling identifies the function worth optimizing (CPU share above the threshold).
- Criterion microbench measures the candidate fixes in isolation at nanosecond resolution.
- Predicted CPU % is extrapolated linearly from microbench timings.
- Ship the winner; re-sample production; verify predicted vs measured match. If they do, the methodology is trustworthy for the next optimization.
The 2024-09-10 trie-hard rollout matched predicted CPU (0.43 %) against measured CPU (0.34 %) within ~0.1 % — tight enough to trust criterion as the decision substrate for the next helper (sources/2024-09-10-cloudflare-a-good-day-to-trie-hard).
Related¶
- patterns/measurement-driven-micro-optimization — the discipline this technique enables.
- concepts/hot-path — the target code surface.
- concepts/observability — umbrella domain.
- systems/criterion-rust — the complementary per-function microbench tool.