CONCEPT Cited by 2 sources

Feedback-directed optimization¶

Definition¶

Feedback-directed optimization (FDO) is the umbrella family of compiler / binary-optimisation techniques where actual runtime execution data is fed back into the compilation / linking / post-link pipeline to make optimisation decisions that would otherwise rely on static heuristics.

The FDO family includes:

Profile-guided optimization (PGO) — compile-time FDO; profile feeds the compiler.
BOLT / post-link binary optimisers — post-link FDO; profile feeds a standalone tool that rewrites the linked binary.
AutoFDO — sampling-based PGO variant; profile comes from Linux perf on unmodified production binaries.
CSSPGO — Context-Sensitive Sample-based PGO, Meta's canonical fleet-scale variant.
LBR-based FDO — uses the Last Branch Record CPU feature for zero-overhead branch-frequency data.

FDO is distinguished from traditional optimisation by its information source: measurement, not assumption.

The canonical FDO pipeline¶

A mature FDO deployment has four stages:

Profile collection — either instrumented or sampling mode. Fleet-wide continuous sampling is the scale-preferred shape (Meta's Strobelight); staging-workload instrumented is the setup-preferred shape (Redpanda's 26.1 approach).
Profile aggregation / validation — merge profiles from many hosts; validate coverage; age-out stale data.
Optimisation pass — consume the profile at compile time (PGO / CSSPGO) or post-link time (BOLT).
Deployment — ship the optimised binary; measure the win; close the loop with fresh profile collection.

For the fleet-scale composition of these stages, see patterns/feedback-directed-optimization-fleet-pipeline.

The pattern of wins¶

FDO's measured wins across different deployments (rough order of magnitude):

Deployment	Measured improvement
Redpanda Streaming 26.1 (C++, PGO, small-batch)	47% p999 latency, 15% CPU reactor util, 10-15% overall efficiency (Source: sources/2026-04-02-redpanda-supercharging-streaming-with-profile-guided-optimization)
Meta fleet (CSSPGO + BOLT, top-200 services)	Up to 20% CPU cycles, 10-20% server reduction (Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology)
Generic frontend-bound C++ service	5-15% typical

Wins concentrate on frontend-bound workloads where the hot path has many functions, deep inlining choices, and complex control flow — where static heuristics are weakest and profile data is most valuable.

Why FDO pays for itself¶

FDO's engineering investment (build-pipeline changes, profile storage, cadence management) is offset by fleet-scale capacity savings:

At Meta scale, 10-20% server reduction on the top-200 services is "the economic datum that pays for Strobelight as a platform" (from systems/strobelight overview).
At Redpanda-Cloud scale, 15% CPU reactor utilisation improvement directly reduces the number of vCPU-hours billed per cluster — material to Redpanda's cell-based cost model.

FDO fits the offensive performance engineering framing: rather than defending against a specific regression, FDO makes the hot binary systematically faster by extracting information the compiler doesn't have access to by default.

Trade-offs vs traditional optimisation¶

Axis	Static optimisation	FDO
Input	Source + heuristics	Source + heuristics + runtime profile
Build-time cost	Baseline	2× (PGO) or baseline + post-link pass (BOLT)
Infra cost	None	Profile collection + storage
Stability	Deterministic from source	Profile-dependent
Maintenance	None	Profile freshness cadence
Typical win	0 (you already run this)	5-20% on hot paths
Coverage	Every binary	Only profiled binaries

Getting started¶

A pragmatic FDO adoption path for a C++ codebase:

Pick a single hot-path binary — the one where capacity savings matter most.
Add TMA measurement — Linux perf or equivalent. Confirm the workload is frontend-bound enough to reward FDO. See patterns/tma-guided-optimization-target-selection.
Choose PGO or BOLT — PGO for stability; BOLT for build-time economy and when LLVM expertise is available.
Set up a training workload — a representative production-like benchmark; this is the profile-collection input.
Validate end-to-end — measure the same TMA categories before and after; look for the frontend-bound percentage to drop.
Automate the build — ship the profile-collection → recompile cycle behind a CI flag that can be toggled per-release.

Seen in¶

sources/2026-04-02-redpanda-supercharging-streaming-with-profile-guided-optimization — Redpanda 26.1 PGO rollout with TMA measurements and explicit PGO-vs-BOLT trade-off analysis.
sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology — Meta's fleet-scale FDO via Strobelight → CSSPGO + BOLT.

concepts/profile-guided-optimization — the compile-time subfamily.
concepts/llvm-bolt-post-link-optimizer — the post-link subfamily.
concepts/hot-cold-code-splitting / concepts/instruction-cache-locality — the mechanisms FDO exploits.
concepts/instrumented-vs-sampling-profile — the profile- collection shapes.
concepts/offense-defense-performance-engineering — the broader performance-engineering framing.
concepts/capacity-efficiency — the economic payoff.
systems/clang / systems/llvm-bolt / systems/meta-bolt-binary-optimizer / systems/strobelight — the tooling.
systems/redpanda — Tier-3 canonical example.
patterns/feedback-directed-optimization-fleet-pipeline — the Meta-scale composition.
patterns/pgo-for-frontend-bound-application — the diagnose-then-apply pattern.