Skip to content

PATTERN Cited by 1 source

Delayed symbolization service

Delayed symbolization service = the system shape where production hosts ship raw instruction addresses + unwound stacks to a centralised off-host service that resolves them to (function, file, line, type, inline-sites) tuples using pre-indexed debug info from every production binary.

It is the canonical architectural answer to the problem that DWARF is too large to parse on the profiled host, that inline symbolization perturbs the workload being profiled, and that per-host caches churn faster than fleet-wide binary releases.

The shape (Strobelight canonical form)

  1. Profiler captures raw stacks on-host via frame-pointer unwinding
  2. eBPF. Output is (build_id, pc, caller_pcs[]) — compact.
  3. Raw records write to disk. On-host processing ends here; the host never parses DWARF.
  4. Symbolization service is central and pre-indexed.
  5. Downloads DWARF + ELF for every production binary, parses once per build_id, stores the distilled facts in a backing database.
  6. Uses gsym (compact DWARF-derived format) + blazesym (multi-language symbolization library) + raw DWARF + ELF as needed.
  7. Profilers RPC the service at the end of a profile.
  8. Service returns (function, file, line, type, inlines) tuples. Inline sites that cheap runtime unwinds lose are reattached here.
  9. Symbolized stacks land in the UI + warm store — typically within seconds of capture.

Why the service shape wins

  • Single audit surface for DWARF parsing. One service, one parser, one codebase. Bugs in DWARF handling get fixed once.
  • Pre-indexed DB serves many consumers. N fleet-wide profilers × K profile requests per minute all hit the same pre-indexed facts.
  • Producer-consumer decoupling. On-host raw capture can't stall on DWARF parsing; samples can't be dropped by a slow symbolizer.
  • Cache cost amortised across the fleet. DWARF for a given binary version is parsed once per binary, not once per host.
  • Richness without cost at capture. Inline-site info from DWARF gets reattached off-host; the on-host path stays lean.

The Strobelight canonical form

"Strobelight gets around this problem via a symbolization service that utilizes several open source technologies including DWARF, ELF, gsym, and blazesym. At the end of a profile Strobelight sends stacks of binary addresses to a service that sends back symbolized stacks with file, line, type info, and even inline information. It can do this because it has already done all the heavy lifting of downloading and parsing the DWARF data for each of Meta's binaries (specifically, production binaries) and stores what it needs in a database. Then it can serve multiple symbolization requests coming from different instances of Strobelight running throughout the fleet."

— Meta Engineering, 2025-01-21 Strobelight post (Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology)

Key design choices

  1. Pre-index, don't lazy-parse. The service pays the DWARF cost once per binary version, not per request. Sets the service latency floor.
  2. Use compact derived formats where possible. gsym is DWARF-derived but tens of times smaller for address-lookup queries. Strobelight composes gsym + blazesym + raw DWARF across consumers.
  3. Frame-pointer unwinding at capture. Frame pointers give the service cheap raw addresses to work from — CFI unwind at capture would be an order of magnitude more expensive and would partly defeat the point of the off-host service.
  4. Build-ID as the primary key. All lookups route through (build_id, pc); build-ID is produced by the linker and carried in the binary + every profile record.

Non-goals

  • Source-level debugging. The service isn't a debugger; it answers one question — "what function / file / line is this PC?"
  • Symbolizing arbitrary binaries. Scope is production binaries whose DWARF the service has already indexed. Non- production binaries aren't symbolized.

Load-bearing precondition

This pattern depends on frame pointers being enabled on every fleet binary. Without frame pointers, on-host stack unwind becomes expensive (CFI) or unreliable (heuristic). Meta's disclosed trade is to pay the 1-2% register-pressure tax fleet-wide for the feasibility of this pattern.

Seen in

Last updated · 550 distilled / 1,221 read