Skip to content

CONCEPT Cited by 1 source

Frame-pointer unwinding

Frame-pointer unwinding is the stack-walking technique that relies on a dedicated CPU register (%rbp on x86-64, x29 on AArch64) storing the caller's frame pointer — forming a singly-linked list of stack frames the profiler can walk cheaply at sample time by chasing the chain.

It is the cheap unwinding primitive — far less expensive than CFI-based (DWARF) unwind at sample time — but it imposes a small register-pressure tax across the binary: the frame-pointer register can't be used as a general-purpose register on the function body's hot path.

The Meta bet

"All of this is made possible with the inclusion of frame pointers in all of Meta's user space binaries, otherwise we couldn't walk the stack to get all these addresses (or we'd have to do some other complicated/expensive thing which wouldn't be as efficient)."

— Meta Engineering, 2025-01-21 Strobelight post (Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology)

The post explicitly links to Brendan Gregg's 2024 "The Return of the Frame Pointers"brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html — which is the canonical polemic for enabling frame pointers fleet-wide.

The trade-off axis

Approach Sample-time cost Binary-time cost
Frame pointers (-fno-omit-frame-pointer) O(stack depth) register-chase ~1-2% register pressure, sometimes more on ARM
DWARF CFI (SHF_ALLOC .eh_frame_hdr) per-frame table lookup + stateful unwind none at runtime, but complex parse on every sample
LBR (Last Branch Record) sampling hardware-captured, O(1) none

Frame pointers are the predictable-low-cost-at-sample-time choice — they make patterns/default-continuous-profiling feasible because the sample cost is bounded and small.

Why Meta pays the tax

Three compounding reasons:

  1. 42+ profilers sampling stacks continuously. The per-sample CPU budget has to be tiny; CFI unwind on every sample is too expensive at Meta's cadence.
  2. Delayed symbolization decouples capture from resolution. Frame pointers produce cheap raw address lists; DWARF still gets consulted off-host via the centralised service to resolve inlines + line numbers.
  3. Fleet-wide cost amortises the tax. The 1-2% register-pressure tax buys Meta the "up to 20% CPU-cycles reduction" on top-200 services via the FDO pipeline — a large-margin positive-EV trade.

The historical context

Frame pointers were disabled by default in many toolchains (gcc -O2 omits them; glibc used to ship without them) to reclaim the register. Brendan Gregg's 2024 post argued this was a decision made in the 1990s that no longer pencils out: the per-binary tax is dominated by the fleet-profiling value, and widespread opt-in is the prerequisite for modern continuous-profiling ecosystems.

Meta's public disclosure is a canonical corroboration: the tax is paid fleet-wide because the returns compound.

Seen in

Last updated · 550 distilled / 1,221 read