SYSTEM Cited by 1 source

Strobelight (Meta)¶

Strobelight is Meta's fleet-wide profiling orchestrator — not a single profiler but a scheduler + coordinator + symbolization frontend over 42 different profilers (at the time of the 2025-01-21 Meta Engineering post), many of them built on eBPF. It runs on every production host at Meta, provides a CLI + web UI for on-demand profiling, and accepts continuous / triggered profile configurations via Configerator. Partially open-sourced at github.com/facebookincubator/strobelight.

Canonical system shape¶

Orchestrator, not a profiler. Strobelight connects resource usage to source code; it schedules and coordinates profilers rather than being a profiler itself. Canonical wiki instance of the profiler-orchestrator pattern.
42 profilers (and growing), covering:
- Memory profilers powered by systems/jemalloc.
- Function call-count profilers.
- Event-based profilers (both native and non-native: Python, Java, Erlang).
- AI / GPU profilers.
- Off-CPU-time profilers.
- Service request-latency profilers.
Three execution modes:
- On-demand — engineers invoke via CLI or web UI; data visible in Scuba within seconds.
- Continuous — default curated profilers run automatically on every host at tuned intervals/rates.
- Triggered — profilers kick in on defined conditions.
Ad-hoc profilers via bpftrace scripts — engineers can ship a new profiler in hours rather than weeks, by committing a bpftrace script and telling Strobelight to run it like any other profiler. Canonical patterns/ad-hoc-bpftrace-profiler instance.
Dynamic sampling rate tuning — config specifies desired samples/hour per service (example: 40,000); Strobelight tunes per-service run probability daily to hit the target. Each sample's weight is recorded so aggregation across hosts + across services is mathematically valid.
Default continuous profiling — flight-recorder posture: always-on curated profilers so data is already there when an incident or efficiency question opens.
Safety + concurrency rules:
- PMU counter coordination — only one CPU-cycles profiler at a time per host.
- Profiler queue to serialise work.
- DB-write rate controls protect the retention budget of downstream stores.
- Operators can still force-hammer machines for heavy debugging.

Load-bearing outputs¶

Default profilers worth calling out¶

LBR profiler — samples Intel Last Branch Records. Data is not visualised directly; it feeds Meta's FDO pipeline. FDO profiles drive compile-time (CSSPGO) and post-compile-time (BOLT) binary optimisations. Meta's top 200 largest services all have continuous-LBR-fed FDO profiles. Some see "up to 20% reduction in CPU cycles" — 10-20% fewer servers needed to run those services.
Event profiler — Strobelight's version of the Linux perf tool. Collects user + kernel stack traces on multiple perf events (CPU cycles, L3 misses, instructions, …). Output drives both interactive flame-graph review and automated regression-detection (pre-prod).
Crochet profiler — combines request spans + CPU-cycles stacks + off-CPU data on a single timeline; consumed in the Tracery UI.

Stack enrichment¶

Stack Schemas — DSL (inspired by Microsoft's stack tags) that adds tags to whole stacks or individual frames and regex-strips frames the viewer doesn't care about. Any number of schemas apply per service or per profile.
Strobemeta — thread-local-storage mechanism to attach runtime metadata (request IDs, endpoint names, latency buckets, …) to call stacks at sample time via eBPF. Makes request-context-aware profiling possible — e.g. "stacks for p99 latency requests only" — without post-hoc join-to-other-telemetry.

Output surfaces¶

Scuba — the primary data + UI surface; flame graphs, pie charts, time-series, distributions, free-form query.
Tracery — trace-timeline tool; client-side columnar DB in JavaScript for responsive zoom + filter on large samples; consumed for the Crochet profiler among others.

Symbolization service¶

Delayed symbolization service — raw addresses + frame-pointer-unwound stacks are sent to a central service; DWARF / ELF / gsym / blazesym pre-indexed over all Meta production binaries; returns function + file + line + type info (including inlines).
Frame pointers enabled on all Meta user-space binaries — the platform precondition that makes stack-walk cheap at fleet scale.

Operational numbers¶

42 profilers orchestrated.
Top 200 services served continuous LBR → FDO → binary-optimisation.
Up to 20% CPU-cycles reduction per optimised service.
~15,000 servers/year saved by one-character & fix ("The Biggest Ampersand") on a hot-path std::vector copy in an ads service — an instance of Scuba-query-driven performance triage enabled by the symbolized-file-and-line data Strobelight captures.

Open-source status¶

"We're currently working on open-sourcing Strobelight's profilers and libraries." Incubator org at github.com/facebookincubator/strobelight. Several supporting libraries are already open-source: systems/bpftrace, systems/blazesym, systems/jemalloc, and BOLT.

Seen in¶

sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology — the canonical Meta Engineering introduction to Strobelight (2025-01-21; Production Engineering).

systems/ebpf — the load-bearing kernel primitive.
systems/bpftrace — the ad-hoc-profiler substrate.
systems/jemalloc — memory-profiler backend.
systems/meta-bolt-binary-optimizer — post-compile FDO consumer.
systems/tracery-meta — secondary visualisation surface.
systems/meta-configerator — config substrate for continuous / triggered profiling.
systems/blazesym, systems/gsym — symbolization libraries / format.
systems/scuba-meta — primary output store + UI.
patterns/profiler-orchestrator — the canonical pattern Strobelight instantiates.
patterns/feedback-directed-optimization-fleet-pipeline — the economic engine.
patterns/ad-hoc-bpftrace-profiler — the velocity multiplier.
patterns/delayed-symbolization-service — the scale-out symbolization architecture.
patterns/default-continuous-profiling — the flight-recorder posture.
concepts/ebpf-profiling, concepts/dynamic-sampling-rate-tuning, concepts/delayed-symbolization, concepts/frame-pointer-unwinding, concepts/ad-hoc-profiler, concepts/stack-tag-enrichment, concepts/runtime-metadata-attach
companies/meta