CONCEPT Cited by 2 sources
Instrumented vs sampling profile¶
Definition¶
There are two structural shapes for collecting the profile data that feeds PGO, LLVM BOLT, and similar feedback-directed optimisation tools:
- Instrumented profile — the binary is modified (either at compile time or post-link) to insert explicit counters that record every executed branch / basic block / function entry. Profile data is written out on program termination.
- Sampling profile — the binary runs unchanged; a separate
profiler (Linux
perf, DTrace, eBPF profiler) periodically inspects the CPU's program-counter (and optionally branch history via LBR) to estimate the execution distribution statistically.
Both shapes produce the same output format (or close enough to be convertible) consumable by the downstream optimiser.
The trade-off¶
| Property | Instrumented | Sampling |
|---|---|---|
| Baseline overhead | 2-50% runtime cost | 0-1% |
| Coverage | Deterministic (every branch recorded) | Probabilistic (frequency-weighted) |
| Rare-branch visibility | Complete | Poor (sampling misses rare events) |
| Hot-branch precision | High | High |
| Production-friendliness | Requires separate build/deploy | Runs on production binary unchanged |
| Deployment shape | Training workload in staging | Continuous fleet-wide profiling |
| Risk of binary bugs | Instrumentation can destabilise (BOLT case) | None (binary unmodified) |
Instrumented wins on completeness — rare branches are captured, and the compiler can make layout decisions with certainty about call-site frequencies. Sampling wins on production-scale practicality — running unchanged binaries in production to collect profiles costs essentially nothing and avoids the "which workload do we run in staging?" question.
Instrumented in PGO and BOLT¶
Both PGO and BOLT support instrumentation modes, with distinct mechanisms:
- Clang PGO instrumented (
-fprofile-generate) — the compiler inserts counters during phase 1 of the two-phase build. Binary runs the training workload; profile written on exit. - BOLT instrumented — "BOLT doesn't require an extra compilation. It creates an instrumented binary by injecting instructions directly into the compiled executable" (Source: sources/2026-04-02-redpanda-supercharging-streaming-with-profile-guided-optimization). BOLT's instruction-injection is the source of its brittleness — modifying a linked binary's control-flow graph without compiler-semantic invariants available for verification.
Sampling in PGO and BOLT¶
Sampling-based FDO is Meta's production choice (Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology): Strobelight samples Last Branch Record (LBR) data from production fleets continuously, producing profiles that feed:
- Compile-time: CSSPGO (Context-Sensitive Sample-based Profile-Guided Optimization).
- Post-compile: BOLT.
The architectural payoff: the binary people are actually running gets profiled, not a proxy workload. Updates to profiles are continuous rather than build-gated.
Clang + AutoFDO provides the sampling-PGO variant for the compile-time side outside Meta.
When to pick each¶
- Choose instrumented when:
- You have a representative training workload that matches production distribution.
- The build pipeline tolerates a two-phase compile.
- Rare-branch accuracy matters (security code, error paths that shape exception handling).
-
Binary modification bugs aren't catastrophic (you can re-run the training quickly).
-
Choose sampling when:
- You have fleet-wide continuous profiling infrastructure (Strobelight / Google-Wide Profiling equivalents).
- Production binaries must run unchanged (compliance, deployment, stability reasons).
- Rare branches are acceptably visible through aggregate fleet sampling.
- Profile freshness matters (many releases per week).
Redpanda's decision¶
Redpanda's 2026-04-02 PGO rollout (Source: sources/2026-04-02-redpanda-supercharging-streaming-with-profile-guided-optimization) discloses both modes as available but does not explicitly disclose which they chose. The PGO-over-BOLT preference on stability grounds, plus the two-phase-compile fit of their build pipeline, suggests instrumented clang PGO. Verified by the post's framing of PGO as a "two-phase compilation process".
Seen in¶
- sources/2026-04-02-redpanda-supercharging-streaming-with-profile-guided-optimization — canonicalises both modes with the PGO-vs-BOLT trade-off and names instruction-injection as the source of BOLT's brittleness.
- sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology — Meta's fleet-scale sampling-mode via Strobelight → CSSPGO + BOLT.
Related¶
- concepts/profile-guided-optimization — the consumer.
- concepts/llvm-bolt-post-link-optimizer — the post-link consumer.
- concepts/feedback-directed-optimization — the umbrella.
- concepts/flamegraph-profiling — the human-facing sampling-visualisation sibling.
- systems/linux-perf — the canonical sampling tool.
- systems/strobelight — the fleet-wide sampling platform.
- systems/meta-bolt-binary-optimizer — the post-link consumer.
- systems/redpanda — Tier-3 instrumented-mode adopter.
- patterns/pgo-for-frontend-bound-application — the apply pattern.
- patterns/feedback-directed-optimization-fleet-pipeline — the Meta-scale sampling-based composition.