CONCEPT Cited by 1 source
DWARF debug info¶
DWARF (Debugging With Attributed Record Formats) is the standard debug-information format emitted by Unix toolchains (gcc, clang) into ELF binaries. It encodes — at roughly the precision of the original source — function names, parameter types, local variable locations, inline-expansion sites, line tables mapping instruction addresses to source file + line + column, and the CFI (Call Frame Information) needed to unwind a stack without frame pointers.
Why it's useful for profilers¶
A profile captures instruction addresses. To be read by a
human (or an LLM, or a query engine), those addresses must be
resolved to (function, file, line, type, inline-site) tuples
— a process called symbolization.
For production-profiler consumers:
| Payload | DWARF give you |
|---|---|
| Function name | yes |
| Source file + line | yes |
| Inlined-caller chain | yes |
| Type info for parameters / locals | yes |
| Frame-unwind info (if frame pointers off) | yes |
That richness is why DWARF is the default input to production symbolization.
The cost problem¶
"Most of the time getting the whole enchilada means using a binary's DWARF debug info. But this can be many megabytes (or even gigabytes) in size because DWARF debug data contains much more than the symbol information. This data needs to be downloaded then parsed. But attempting this while profiling, or even afterwards on the same host where the profile is gathered, is far too computationally expensive."
— Meta Engineering (2025-01-21 Strobelight post, Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology)
The cost profile has three load-bearing consequences:
- On-host symbolization is not feasible at fleet scale. Parsing DWARF on the profiled host burns memory + CPU that belong to the workload being profiled — motivates concepts/delayed-symbolization.
- Symbolization must be a service, not a library call. Pre-download + pre-parse DWARF once per binary version, store the distilled facts in a backing database, serve symbolization over RPC — motivates patterns/delayed-symbolization-service.
- Compact distilled formats exist. gsym is the DWARF-derived format optimised for address-lookup; blazesym is the library that consumes DWARF + ELF + gsym behind a single API.
Seen in¶
- sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology — canonical production-scale DWARF-is-too-expensive-on-host disclosure; motivates Meta's centralised symbolization service and the DWARF → gsym + blazesym toolchain.
Related¶
- concepts/delayed-symbolization — the architectural response.
- concepts/frame-pointer-unwinding — the cheap alternative to CFI-based stack-unwind; Meta pays the 1-2% register-pressure tax so DWARF CFI isn't needed on the hot path.
- systems/gsym — DWARF-derived compact symbolization format.
- systems/blazesym — multi-format symbolization library.
- patterns/delayed-symbolization-service — the system shape.
- systems/strobelight — the canonical consumer at hyperscale.