Skip to content

CONCEPT Cited by 2 sources

Instrumented vs sampling profile

Definition

There are two structural shapes for collecting the profile data that feeds PGO, LLVM BOLT, and similar feedback-directed optimisation tools:

  • Instrumented profile — the binary is modified (either at compile time or post-link) to insert explicit counters that record every executed branch / basic block / function entry. Profile data is written out on program termination.
  • Sampling profile — the binary runs unchanged; a separate profiler (Linux perf, DTrace, eBPF profiler) periodically inspects the CPU's program-counter (and optionally branch history via LBR) to estimate the execution distribution statistically.

Both shapes produce the same output format (or close enough to be convertible) consumable by the downstream optimiser.

The trade-off

Property Instrumented Sampling
Baseline overhead 2-50% runtime cost 0-1%
Coverage Deterministic (every branch recorded) Probabilistic (frequency-weighted)
Rare-branch visibility Complete Poor (sampling misses rare events)
Hot-branch precision High High
Production-friendliness Requires separate build/deploy Runs on production binary unchanged
Deployment shape Training workload in staging Continuous fleet-wide profiling
Risk of binary bugs Instrumentation can destabilise (BOLT case) None (binary unmodified)

Instrumented wins on completeness — rare branches are captured, and the compiler can make layout decisions with certainty about call-site frequencies. Sampling wins on production-scale practicality — running unchanged binaries in production to collect profiles costs essentially nothing and avoids the "which workload do we run in staging?" question.

Instrumented in PGO and BOLT

Both PGO and BOLT support instrumentation modes, with distinct mechanisms:

  • Clang PGO instrumented (-fprofile-generate) — the compiler inserts counters during phase 1 of the two-phase build. Binary runs the training workload; profile written on exit.
  • BOLT instrumented"BOLT doesn't require an extra compilation. It creates an instrumented binary by injecting instructions directly into the compiled executable" (Source: sources/2026-04-02-redpanda-supercharging-streaming-with-profile-guided-optimization). BOLT's instruction-injection is the source of its brittleness — modifying a linked binary's control-flow graph without compiler-semantic invariants available for verification.

Sampling in PGO and BOLT

Sampling-based FDO is Meta's production choice (Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology): Strobelight samples Last Branch Record (LBR) data from production fleets continuously, producing profiles that feed:

  • Compile-time: CSSPGO (Context-Sensitive Sample-based Profile-Guided Optimization).
  • Post-compile: BOLT.

The architectural payoff: the binary people are actually running gets profiled, not a proxy workload. Updates to profiles are continuous rather than build-gated.

Clang + AutoFDO provides the sampling-PGO variant for the compile-time side outside Meta.

When to pick each

  • Choose instrumented when:
  • You have a representative training workload that matches production distribution.
  • The build pipeline tolerates a two-phase compile.
  • Rare-branch accuracy matters (security code, error paths that shape exception handling).
  • Binary modification bugs aren't catastrophic (you can re-run the training quickly).

  • Choose sampling when:

  • You have fleet-wide continuous profiling infrastructure (Strobelight / Google-Wide Profiling equivalents).
  • Production binaries must run unchanged (compliance, deployment, stability reasons).
  • Rare branches are acceptably visible through aggregate fleet sampling.
  • Profile freshness matters (many releases per week).

Redpanda's decision

Redpanda's 2026-04-02 PGO rollout (Source: sources/2026-04-02-redpanda-supercharging-streaming-with-profile-guided-optimization) discloses both modes as available but does not explicitly disclose which they chose. The PGO-over-BOLT preference on stability grounds, plus the two-phase-compile fit of their build pipeline, suggests instrumented clang PGO. Verified by the post's framing of PGO as a "two-phase compilation process".

Seen in

Last updated · 470 distilled / 1,213 read