PATTERN Cited by 1 source

Profiler orchestrator¶

Profiler orchestrator = a platform shape where production-host profiling is centralised behind a single scheduler + coordinator + safety-enforcer + symbolization-frontend that sits above many individual profilers, each specialised to a signal (CPU cycles, memory, off-CPU, request latency, GPU utilisation, language-specific events, …).

The orchestrator does the operationally hard parts so profilers can be simple. It owns:

Scheduling + queuing — which profiler runs where, when, how often.
Three execution modes — on-demand (user-invoked), continuous (always-on, flight-recorder posture, see patterns/default-continuous-profiling), triggered (conditions-based).
Safety rules — dynamic sampling rate tuning; PMU-counter coordination; retention- budget protection; concurrency rules so profilers don't collide.
Config substrate — typically config-as-code so service owners can continuously or triggered-profile their service by committing a config change.
Output routing — all profilers land in the same warm store
UI (Scuba at Meta), accompanied by tags and runtime metadata.
Symbolization — see patterns/delayed-symbolization-service.
User-ad-hoc escape hatch — see patterns/ad-hoc-bpftrace-profiler.

Why orchestrate at all?¶

Three forces:

Safety is centralised. A single profiler that mis-configures a PMU counter, over-writes to the sample DB, or hammers a host is bad. N independent profilers without coordination multiplies the blast-radius. Centralisation means one place owns the safety story for all profilers.
Economics compound. Once the orchestrator exists, adding profiler N+1 is cheap (config + onboarding); every profiler benefits from the same symbolization service, output pipeline, UI, concurrency rules. This is the platform-unit-economics that turns "profiling" into a horizontal capability.
Cross-profiler composition becomes feasible. Strobelight's Crochet profiler combines request spans + CPU cycles + off-CPU data on one timeline — trivial when the orchestrator owns scheduling, expensive otherwise.

Canonical instance: Meta Strobelight¶

"Strobelight, Meta's profiling orchestrator, is not really one technology. It's several (many open source) combined to make something that unlocks truly amazing efficiency wins. Strobelight is also not a single profiler but an orchestrator of many different profilers (even ad-hoc ones) that runs on all production hosts at Meta, collecting detailed information about CPU usage, memory allocations, and other performance metrics from running processes."

— Meta Engineering, 2025-01-21 (Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology)

Strobelight orchestrates 42+ profilers as of the 2025-01-21 post — memory (jemalloc-backed), function-call counts, event-based (C++, Python, Java, Erlang), AI/GPU, off-CPU, request-latency. All three modes. Configerator-driven config. Output to Scuba + Tracery. Symbolization via a dedicated central service. Ad-hoc profilers via bpftrace. Default continuous profiling for every Meta service. Feeds the FDO pipeline.

Safety mechanisms (Strobelight canonical list)¶

Dynamic sampling rate tuning — per-service desired-count targets with daily re-tune; sample weights for valid cross-host/service aggregation.
PMU counter coordination — only one CPU-cycles profiler at a time per host (hardware performance counters are a shared resource).
Concurrency rules — profiler queue so two profilers that touch the same subsystem don't run simultaneously.
DB-write rate controls — protect retention of the downstream store (Scuba) from accidentally-over-rated profilers.
Operator escape hatch — service owners can still force- hammer machines for heavy debugging when they know what they're doing.

Load-bearing economic output¶

The orchestrator pays for itself at fleet scale via the FDO pipeline: LBR profiler → FDO profiles → CSSPGO (compile-time) + BOLT (post-compile) → up to 20% CPU-cycles reduction on Meta's top 200 services.

Seen in¶

sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology — canonical Meta disclosure; Strobelight is the archetypal instance of this pattern. 42+ orchestrated profilers; 10-20% server reduction outcome on top-200 services; 15,000 servers/year saved by a single ampersand fix enabled by the pipeline.

systems/strobelight — canonical production instance.
systems/meta-configerator — config substrate.
systems/ebpf, systems/bpftrace — the kernel-level substrates most orchestrated profilers ride.
concepts/continuous-profiling — the signal class.
concepts/ebpf-profiling — the usual profiler implementation substrate.
concepts/dynamic-sampling-rate-tuning — a core safety + aggregation mechanism.
concepts/ad-hoc-profiler — the escape-hatch concept.
patterns/default-continuous-profiling — the operational posture.
patterns/delayed-symbolization-service — the companion backend.
patterns/ad-hoc-bpftrace-profiler — the companion engineer-velocity pattern.
patterns/feedback-directed-optimization-fleet-pipeline — the economic engine.
companies/meta