PATTERN Cited by 1 source
Profiler orchestrator¶
Profiler orchestrator = a platform shape where production-host profiling is centralised behind a single scheduler + coordinator + safety-enforcer + symbolization-frontend that sits above many individual profilers, each specialised to a signal (CPU cycles, memory, off-CPU, request latency, GPU utilisation, language-specific events, …).
The orchestrator does the operationally hard parts so profilers can be simple. It owns:
- Scheduling + queuing — which profiler runs where, when, how often.
- Three execution modes — on-demand (user-invoked), continuous (always-on, flight-recorder posture, see patterns/default-continuous-profiling), triggered (conditions-based).
- Safety rules — dynamic sampling rate tuning; PMU-counter coordination; retention- budget protection; concurrency rules so profilers don't collide.
- Config substrate — typically config-as-code so service owners can continuously or triggered-profile their service by committing a config change.
- Output routing — all profilers land in the same warm store
- UI (Scuba at Meta), accompanied by tags and runtime metadata.
- Symbolization — see patterns/delayed-symbolization-service.
- User-ad-hoc escape hatch — see patterns/ad-hoc-bpftrace-profiler.
Why orchestrate at all?¶
Three forces:
- Safety is centralised. A single profiler that mis-configures a PMU counter, over-writes to the sample DB, or hammers a host is bad. N independent profilers without coordination multiplies the blast-radius. Centralisation means one place owns the safety story for all profilers.
- Economics compound. Once the orchestrator exists, adding profiler N+1 is cheap (config + onboarding); every profiler benefits from the same symbolization service, output pipeline, UI, concurrency rules. This is the platform-unit-economics that turns "profiling" into a horizontal capability.
- Cross-profiler composition becomes feasible. Strobelight's Crochet profiler combines request spans + CPU cycles + off-CPU data on one timeline — trivial when the orchestrator owns scheduling, expensive otherwise.
Canonical instance: Meta Strobelight¶
"Strobelight, Meta's profiling orchestrator, is not really one technology. It's several (many open source) combined to make something that unlocks truly amazing efficiency wins. Strobelight is also not a single profiler but an orchestrator of many different profilers (even ad-hoc ones) that runs on all production hosts at Meta, collecting detailed information about CPU usage, memory allocations, and other performance metrics from running processes."
— Meta Engineering, 2025-01-21 (Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology)
Strobelight orchestrates 42+ profilers as of the 2025-01-21 post — memory (jemalloc-backed), function-call counts, event-based (C++, Python, Java, Erlang), AI/GPU, off-CPU, request-latency. All three modes. Configerator-driven config. Output to Scuba + Tracery. Symbolization via a dedicated central service. Ad-hoc profilers via bpftrace. Default continuous profiling for every Meta service. Feeds the FDO pipeline.
Safety mechanisms (Strobelight canonical list)¶
- Dynamic sampling rate tuning — per-service desired-count targets with daily re-tune; sample weights for valid cross-host/service aggregation.
- PMU counter coordination — only one CPU-cycles profiler at a time per host (hardware performance counters are a shared resource).
- Concurrency rules — profiler queue so two profilers that touch the same subsystem don't run simultaneously.
- DB-write rate controls — protect retention of the downstream store (Scuba) from accidentally-over-rated profilers.
- Operator escape hatch — service owners can still force- hammer machines for heavy debugging when they know what they're doing.
Load-bearing economic output¶
The orchestrator pays for itself at fleet scale via the FDO pipeline: LBR profiler → FDO profiles → CSSPGO (compile-time) + BOLT (post-compile) → up to 20% CPU-cycles reduction on Meta's top 200 services.
Seen in¶
- sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology — canonical Meta disclosure; Strobelight is the archetypal instance of this pattern. 42+ orchestrated profilers; 10-20% server reduction outcome on top-200 services; 15,000 servers/year saved by a single ampersand fix enabled by the pipeline.
Related¶
- systems/strobelight — canonical production instance.
- systems/meta-configerator — config substrate.
- systems/ebpf, systems/bpftrace — the kernel-level substrates most orchestrated profilers ride.
- concepts/continuous-profiling — the signal class.
- concepts/ebpf-profiling — the usual profiler implementation substrate.
- concepts/dynamic-sampling-rate-tuning — a core safety + aggregation mechanism.
- concepts/ad-hoc-profiler — the escape-hatch concept.
- patterns/default-continuous-profiling — the operational posture.
- patterns/delayed-symbolization-service — the companion backend.
- patterns/ad-hoc-bpftrace-profiler — the companion engineer-velocity pattern.
- patterns/feedback-directed-optimization-fleet-pipeline — the economic engine.
- companies/meta