CONCEPT Cited by 1 source
Frame-pointer unwinding¶
Frame-pointer unwinding is the stack-walking technique that
relies on a dedicated CPU register (%rbp on x86-64, x29 on
AArch64) storing the caller's frame pointer — forming a
singly-linked list of stack frames the profiler can walk cheaply
at sample time by chasing the chain.
It is the cheap unwinding primitive — far less expensive than CFI-based (DWARF) unwind at sample time — but it imposes a small register-pressure tax across the binary: the frame-pointer register can't be used as a general-purpose register on the function body's hot path.
The Meta bet¶
"All of this is made possible with the inclusion of frame pointers in all of Meta's user space binaries, otherwise we couldn't walk the stack to get all these addresses (or we'd have to do some other complicated/expensive thing which wouldn't be as efficient)."
— Meta Engineering, 2025-01-21 Strobelight post (Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology)
The post explicitly links to Brendan Gregg's 2024 "The Return of the Frame Pointers" — brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html — which is the canonical polemic for enabling frame pointers fleet-wide.
The trade-off axis¶
| Approach | Sample-time cost | Binary-time cost |
|---|---|---|
Frame pointers (-fno-omit-frame-pointer) |
O(stack depth) register-chase | ~1-2% register pressure, sometimes more on ARM |
DWARF CFI (SHF_ALLOC .eh_frame_hdr) |
per-frame table lookup + stateful unwind | none at runtime, but complex parse on every sample |
LBR (Last Branch Record) sampling |
hardware-captured, O(1) | none |
Frame pointers are the predictable-low-cost-at-sample-time choice — they make patterns/default-continuous-profiling feasible because the sample cost is bounded and small.
Why Meta pays the tax¶
Three compounding reasons:
- 42+ profilers sampling stacks continuously. The per-sample CPU budget has to be tiny; CFI unwind on every sample is too expensive at Meta's cadence.
- Delayed symbolization decouples capture from resolution. Frame pointers produce cheap raw address lists; DWARF still gets consulted off-host via the centralised service to resolve inlines + line numbers.
- Fleet-wide cost amortises the tax. The 1-2% register-pressure tax buys Meta the "up to 20% CPU-cycles reduction" on top-200 services via the FDO pipeline — a large-margin positive-EV trade.
The historical context¶
Frame pointers were disabled by default in many toolchains
(gcc -O2 omits them; glibc used to ship without them) to
reclaim the register. Brendan Gregg's 2024 post argued this was
a decision made in the 1990s that no longer pencils out: the
per-binary tax is dominated by the fleet-profiling value, and
widespread opt-in is the prerequisite for modern
continuous-profiling ecosystems.
Meta's public disclosure is a canonical corroboration: the tax is paid fleet-wide because the returns compound.
Seen in¶
- sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology — canonical Meta disclosure that frame pointers are enabled on all Meta user-space binaries specifically to make fleet-wide profiling feasible; explicit Brendan-Gregg citation.
Related¶
- systems/strobelight — the canonical consumer.
- concepts/dwarf-debug-info — the alternative-cost unwind substrate.
- concepts/delayed-symbolization — the off-host step that complements cheap on-host unwind.
- concepts/ebpf-profiling — the kernel substrate that benefits most from frame pointers (sampling happens in kernel context where CFI would be expensive).
- concepts/stack-trace-sampling-profiling — the parent technique.
- patterns/delayed-symbolization-service — the system-shape sibling.