Skip to content

PATTERN Cited by 1 source

Profiler orchestrator

Profiler orchestrator = a platform shape where production-host profiling is centralised behind a single scheduler + coordinator + safety-enforcer + symbolization-frontend that sits above many individual profilers, each specialised to a signal (CPU cycles, memory, off-CPU, request latency, GPU utilisation, language-specific events, …).

The orchestrator does the operationally hard parts so profilers can be simple. It owns:

Why orchestrate at all?

Three forces:

  1. Safety is centralised. A single profiler that mis-configures a PMU counter, over-writes to the sample DB, or hammers a host is bad. N independent profilers without coordination multiplies the blast-radius. Centralisation means one place owns the safety story for all profilers.
  2. Economics compound. Once the orchestrator exists, adding profiler N+1 is cheap (config + onboarding); every profiler benefits from the same symbolization service, output pipeline, UI, concurrency rules. This is the platform-unit-economics that turns "profiling" into a horizontal capability.
  3. Cross-profiler composition becomes feasible. Strobelight's Crochet profiler combines request spans + CPU cycles + off-CPU data on one timeline — trivial when the orchestrator owns scheduling, expensive otherwise.

Canonical instance: Meta Strobelight

"Strobelight, Meta's profiling orchestrator, is not really one technology. It's several (many open source) combined to make something that unlocks truly amazing efficiency wins. Strobelight is also not a single profiler but an orchestrator of many different profilers (even ad-hoc ones) that runs on all production hosts at Meta, collecting detailed information about CPU usage, memory allocations, and other performance metrics from running processes."

— Meta Engineering, 2025-01-21 (Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology)

Strobelight orchestrates 42+ profilers as of the 2025-01-21 post — memory (jemalloc-backed), function-call counts, event-based (C++, Python, Java, Erlang), AI/GPU, off-CPU, request-latency. All three modes. Configerator-driven config. Output to Scuba + Tracery. Symbolization via a dedicated central service. Ad-hoc profilers via bpftrace. Default continuous profiling for every Meta service. Feeds the FDO pipeline.

Safety mechanisms (Strobelight canonical list)

  • Dynamic sampling rate tuning — per-service desired-count targets with daily re-tune; sample weights for valid cross-host/service aggregation.
  • PMU counter coordination — only one CPU-cycles profiler at a time per host (hardware performance counters are a shared resource).
  • Concurrency rules — profiler queue so two profilers that touch the same subsystem don't run simultaneously.
  • DB-write rate controls — protect retention of the downstream store (Scuba) from accidentally-over-rated profilers.
  • Operator escape hatch — service owners can still force- hammer machines for heavy debugging when they know what they're doing.

Load-bearing economic output

The orchestrator pays for itself at fleet scale via the FDO pipeline: LBR profiler → FDO profiles → CSSPGO (compile-time) + BOLT (post-compile) → up to 20% CPU-cycles reduction on Meta's top 200 services.

Seen in

Last updated · 550 distilled / 1,221 read