PATTERN Cited by 1 source
Default continuous profiling¶
Default continuous profiling is the operational posture of running a low-overhead continuous profiler on every host, all the time, tuned so the overhead is imperceptible in steady state but the profile data is already there the moment an incident starts.
It is the "flight recorder" posture for production software — borrowed from aviation, where the voice-and-data recorder records continuously precisely so the data is available after the incident without anyone having had to decide to turn it on.
Why it matters¶
Profilers have historically been opt-in: attach one, reproduce the issue, read the flame graph. Two problems with that model at scale:
- Not every regression reproduces. Heisenbugs, race conditions, rare bad request paths — the moment you attach a profiler, the bug doesn't happen.
- Incident time is spent setting up the profiler instead of fixing the problem.
Default continuous profiling inverts the posture: the data is always there, and diff profile regression analysis turns incident investigation into a before/after comparison rather than a repro exercise.
Per Grafana Labs' Pyroscope 2.0 launch post:
"With continuous profiling, that last mile [of root cause analysis] shrinks to minutes. You can compare a profile from before and after the regression, diff them, and see exactly which code paths changed. No reproducing in staging, no adding ad-hoc logging, and no guessing."
(Source: sources/2026-04-22-grafana-introducing-pyroscope-2-0)
Prior art: Meta Strobelight¶
Meta's Strobelight is the canonical hyperscaler-scale instance. Every Meta host is profiled continuously via a curated set of sampling profilers, with overhead tuned to not perturb workloads. When an SEV opens, the profile data is already there; responders read it immediately.
What makes it viable now¶
Default continuous profiling only works if overhead is actually low — otherwise it's too expensive to keep on. Three enablers:
- Sampling profilers over instrumented ones. Sampling imposes near- constant, tunable overhead (often < 1% CPU at production sample rates) rather than scaling with workload shape.
- eBPF-based collection. Kernel-side sampling without process-side overhead, minimal attribution cost.
- Architecturally cheap storage backends. Pyroscope 2.0's rearchitecture is explicitly motivated by making "always on for every host" affordable.
When not to¶
- Workload where sampling overhead is unacceptable (ultra-low- latency trading, embedded, realtime). Use opt-in profiling for these; the overhead budget isn't there.
- Regulatory environments restricting fine-grained telemetry. Symbolic information can carry sensitive function names / module paths; some compliance regimes disallow continuous capture.
- When you can't afford the storage. Retention-at-scale is real; design the retention window before you enable it fleet-wide.
Related¶
- systems/pyroscope-2 — OSS continuous-profiling DB designed to make this posture affordable.
- systems/strobelight — hyperscaler-scale instance at Meta.
- concepts/continuous-profiling
- concepts/diff-profile-regression-analysis
- concepts/instrumented-vs-sampling-profile