Skip to content

CONCEPT Cited by 2 sources

LLVM BOLT post-link binary optimizer

Definition

LLVM BOLT (Binary Optimization and Layout Tool) is a post-link binary optimiser — it consumes an already-compiled, already-linked executable plus a profile data file and rewrites the binary in place, reordering functions, splitting hot and cold code, and applying branch-alignment transformations. Unlike PGO, BOLT has no interaction with the compiler and requires no recompilation.

Originally published as "BOLT: A practical binary optimizer for data centers and beyond" (CGO 2019) from Meta; now part of LLVM upstream. See systems/meta-bolt-binary-optimizer for the Meta-tooling perspective and systems/llvm-bolt for the LLVM perspective.

What BOLT changes (structurally)

BOLT operates on the linked binary's .text section + symbol table + debug info and produces a new binary with:

  • Function reordering — hot functions co-located in the binary; cold functions pushed to the end.
  • Hot-cold function splitting — each function's rarely-taken paths moved to a separate .text.cold segment.
  • Basic-block reordering within functions — hot fall-through paths aligned to minimise taken branches.
  • Indirect call promotion — hot indirect-call sites converted to speculative direct calls with guards.

These transformations target the same frontend-bound CPU stalls that PGO attacks, via the same instruction-cache locality mechanism.

Profile-collection modes

BOLT supports both shapes of profile collection:

  • Sampling mode — uses Linux perf LBR (Last Branch Record) data collected from the unchanged binary. Zero baseline overhead; statistical coverage. This is Meta's production choice (StrobelightFDO pipeline).
  • Instrumented mode"BOLT doesn't require an extra compilation. It creates an instrumented binary by injecting instructions directly into the compiled executable" (Source: sources/2026-04-02-redpanda-supercharging-streaming-with-profile-guided-optimization). Deterministic coverage; baseline runtime cost + the risk of the instruction-injection itself destabilising the binary.

The instruction-injection approach is the source of much of BOLT's brittleness — it modifies a linked binary's control flow graph without the compiler's semantic invariants available for verification.

BOLT vs PGO — the build-pipeline position

Axis PGO (compile-time) BOLT (post-link)
Operates on Source → IR Linked binary
Requires recompilation Yes (phase 2) No
Build-time overhead ~2× compile Small (seconds to minutes)
Optimisation scope Full LLVM pass pipeline Code layout + splitting only
Compiler-inliner coordination Yes No
Stability Decades of production Brittle (per-binary bugs)
Composable with other FDO Yes (BOLT on top) Yes (on PGO output)

BOLT's narrower scope is partially by design — it operates too late in the pipeline to redo inlining decisions or regenerate intermediate representation. Its win is capturing layout optimisations that compilers' heuristics miss, without paying the build-time cost of a second full compilation.

The brittleness datum

Redpanda's 2026-04-02 disclosure is the first wiki-canonical non-Meta BOLT brittleness evidence (Source: sources/2026-04-02-redpanda-supercharging-streaming-with-profile-guided-optimization):

"BOLT's approach to operating on the binary directly avoids an extra compilation step, potentially saving significant build time. This can be especially important for larger projects like Redpanda Streaming. At the same time, its binary-modifying nature is quite brittle, and we ran into a few bugs (like this one)."

Redpanda cited this brittleness (plus the unresolved bug set) as the deciding factor in choosing PGO over BOLT for the 26.1 release. The bug reference is an actual LLVM issue disclosed to the upstream project.

This canonicalises the asymmetry between Meta-scale and outside-Meta BOLT adoption:

  • At Meta scale: the fleet-wide CPU win (10-20% server reduction on top-200 services) pays for the engineering investment in working around BOLT bugs as they arise.
  • Outside Meta scale: the win is the same order of magnitude, but the tail-risk of a BOLT bug corrupting a production binary is a deal-breaker when the team doesn't have the LLVM-expert bandwidth to debug it.

Where it pays off

BOLT is the pragmatic choice when:

  • The codebase is too large for PGO's 2× build-time penalty to be tolerable (Meta's monorepo case).
  • Profile data is already being collected for other reasons (fleet-wide continuous profiling via Strobelight / Google-Wide Profiling).
  • The team has compiler / LLVM expertise to debug binary-layout-triggered crashes.

BOLT is not the pragmatic choice when:

  • A single build-pipeline change is the limiting factor — PGO is simpler to set up end-to-end.
  • Stability is mission-critical and debugging a BOLT-induced regression requires binary-level analysis — see the Redpanda 2026-04-02 case.

Seen in

Last updated · 470 distilled / 1,213 read