Skip to content

DATADOG 2026-02-18

Read original ↗

How we reduced the size of our Agent Go binaries by up to 77%

Datadog Engineering retrospective (2026-02-18) on how the Datadog Agent team cut Go-binary sizes by up to 77 % across a 6-month program (Dec 2024 → Jul 2025) spanning versions 7.60.0 → 7.68.0 — without removing a single feature. Linux amd64 compressed .deb package went from 265 MiB → 149 MiB (−44 %), uncompressed 1.22 GiB → 688 MiB (−44 %); individual binaries shrank by 56-77 %. Three ingredients: (1) systematic dependency auditing to find accidentally-pulled transitive deps and prune them at their import sites; (2) re-enabling the Go linker's method dead-code elimination by patching every use of reflect.MethodByName across the codebase + dependencies; (3) tracing a mysterious amd64-only regression to containerd's import of the stdlib plugin package, which silently puts the linker into dynamic-link mode and keeps every exported and unexported method reachable. Fixes are contributed upstream to kubernetes/kubernetes, uber-go/dig, google/go-cmp, containerd/containerd, and now benefit every large Go binary in the ecosystem (Kubernetes itself reports 16-37 % reductions).

Summary

The Agent is a family of binaries (Core Agent, Trace Agent, Process Agent, Security Agent, System Probe) built from a single codebase with Go build tags + dependency injection to select features per target (Linux / macOS / Windows × Docker / Kubernetes / Heroku / IoT / cloud distros). Over 5 years 2019→2024 the Linux amd64 .deb grew from 428 MiB → 1,248 MiB uncompressed (+192 %), compressed 126 MiB → 265 MiB — reflecting years of new features + hundreds of dependencies (cloud SDKs, container runtimes, security scanners). The growth hit real constraints in serverless, IoT, containerized workloads and worsened perception + network + memory costs.

Datadog set out to "bend the curve" — not remove features, just stop shipping what isn't used. The method was measurement-driven:

  1. Map what's included. GOOS=linux GOARCH=amd64 go list -f '{{ join .Deps "\n" }}' -tags t1,t2 ./pkg/main lists every package the compiler selects for a given OS × arch × build-tag combo. The output is what ends up in the binary, not why — for that use goda which takes the same tags
  2. env and builds a dependency graph, supporting a reach(...) function that shows all paths from the main package to any target package.
  3. Measure what each package costs. go list says which packages are in the binary but not how much each one takes. go-size-analyzer reads the built binary and reports per-dependency byte cost — textual or interactive web UI. "Simply importing a package has side effects: init functions run and global variables are initialized, which can be enough to force the linker to keep many unnecessary symbols." The right tool for "which dependencies are actually worth removing".
  4. Find pathological accidental inclusions. Concrete example: the trace-agent binary was supposed to be Kubernetes-free, but go list showed 526 packages from k8s.io/* included and go-size-analyzer attributed ≥30 MiB to them. goda reach traced the entire k8s import graph back to a single function in a single package of the Agent codebase that trace-agent imported for an unrelated reason; the function itself had no k8s dependency, but the package did. Fix: move the function into its own package, update callers. Result: 570 packages removed from the Linux trace-agent + ≥36 MiB size reduction — "more than half of the binary". Instance of patterns/single-function-forced-package-split.

  5. Re-enable method dead-code elimination. Go's linker can normally drop methods no code actually calls. But any use of reflect.MethodByName(name) with a non-constant name forces the linker to keep every exported method of every reachable type — because at build time it can't know which methods the reflection path will resolve (concepts/reflect-methodbyname-linker-pessimism). The canonical real-world offenders are the stdlib text/template and html/template packages: they execute template actions like {{.Error}} by reflecting on arbitrary values with dynamic method names. The linker's -dumpdep flag prints why each symbol is reachable; the whydeadcode tool parses that output and names the first culprit call-chain (only the first entry is guaranteed to be a true positive — iterate). "We initially assumed patching every problematic use of reflect — both in our own codebase and external dependencies — would be too difficult, we gave it a try anyway." Around a dozen upstream patches later (kubernetes/kubernetes#132177, uber-go/dig#425, google/go-cmp#373), and a fork of text/template + html/template into pkg/template/ with method-calls statically disabled, Datadog enabled the optimization across every binary → 16-25 % size reduction per binary, ~100 MiB total. Kubernetes picked the same optimization up and reports 16-37 %.

  6. Trace amd64-only oddities. When Datadog first hacked through their code — commenting out every optimization-disabler to see what was possible — they saw 94 MiB savings on Linux arm64 but almost no change on amd64. whydeadcode said an unexported method of an ordinary type was reachable on amd64 but not on arm64, which shouldn't depend on architecture. Digging into the linker revealed the plugin build mode: importing the stdlib plugin package (source) puts the linker into dynamically linked mode, which disables method dead-code elimination and keeps every unexported method reachable — because a dynamically loaded Go plugin could call anything. goda traced the plugin import to containerd/plugin/plugin_go18.go — a feature Datadog doesn't use. Datadog opened containerd/containerd#11203 to gate it behind a build tag, applied the tag in the Agent via #32538

  7. #32885245 MiB reduction on main Linux amd64 artifacts, ~20 % of total size, benefiting ~75 % of users. Instance of concepts/go-plugin-dynamic-linking-implication + canonical datum for patterns/upstream-the-fix.

Overall result, Linux amd64: - Core Agent: 236 MiB → 103 MiB (−56 %) - Process Agent: 128 MiB → 34 MiB (−74 %) - Trace Agent: 90 MiB → 23 MiB (−74 %) - Security Agent: 152 MiB → 35 MiB (−77 %) - System Probe: 180 MiB → 54 MiB (−70 %)

Key takeaways

  1. Dependency audits need goda + go-size-analyzer, not just go list. go list tells you which packages are in the binary; goda tells you why (full import path + reach() to a target); go-size-analyzer tells you how much each one costs in bytes. The three together turn "we accidentally ship Kubernetes" into "there's one function in one package that pulls k8s, and moving it saves 36 MiB". (Source: sources/2026-02-18-datadog-how-we-reduced-agent-go-binaries-up-to-77-percent)
  2. A single function can pull half your binary. The Agent trace-agent's 570-package / 36-MiB k8s accidental inclusion collapsed to one function in one package. The fix was moving the function into its own package so the rest of the original package (and all its imports) no longer gets dragged in. "This reduction is an extreme example, but was not a unique one. We found many similar cases." Transitive-dependency reachability is often one edge wide. (concepts/transitive-dependency-reachability)
  3. reflect.MethodByName silently disables the linker's most powerful pruner. Method dead-code elimination — dropping methods no code calls — is only safe when the linker can statically prove every method lookup's target. Any use of reflect.MethodByName with a non-constant name (the normal way to use it) defeats that proof → every exported method of every reachable type stays. Real-world impact is dominated by text/template + html/template, both stdlib. Datadog's remedy — patch around a dozen deps + fork the stdlib templates into pkg/template/ with method calls disabled — yielded 16-25 % per binary / ~100 MiB total on Linux amd64. (concepts/reflect-methodbyname-linker-pessimism)
  4. Import the stdlib plugin package → 245 MiB cost. Simply importing plugin (even without using it) puts the Go linker in dynamically-linked mode, disabling method DCE and keeping all unexported methods. containerd/plugin/plugin_go18.go imported it for a feature Datadog didn't use; goda found it, an upstream PR gated it behind a build tag, Datadog applied the tag. 245 MiB / ~20 % / ~75 % of users benefited. Instance of concepts/go-plugin-dynamic-linking-implication. Tiny imports can have outsized costs — "simply importing a package has side effects".
  5. The fix belongs upstream. Every substantive fix Datadog shipped — kubernetes/kubernetes, uber-go/dig, google/go-cmp, containerd/containerd — was a PR to the upstream, not a local fork or a vendor patch. Kubernetes picked up the same method-DCE optimization once Datadog cleared the trail and reports 16-37 % reductions of its own. Canonical patterns/upstream-the-fix benefit: the ecosystem bill goes down, and Datadog doesn't carry a maintenance fork.
  6. Hack-first measurement. The arm64 vs amd64 divergence was surfaced by a "hacked through our codebase and dependencies, commenting out every piece of code that disabled the optimization" run. The binaries didn't work in that state, but they linked — which was enough to measure the upper bound and localise the delta to a specific import. Instance of patterns/measurement-driven-micro-optimization for binary-size: break the binary on purpose to bound the possible win, then do the real work to reclaim it.
  7. Build the debugging kit first. The three tools that made this feasible — go-size-analyzer, goda, whydeadcode — are all OSS, pre-existing, and rely on public Go compiler / linker outputs (go list, -dumpdep). The work was applying them systematically, not building new infrastructure. Anyone with a large Go binary has the same kit available.

Systems / tools surfaced

  • systems/datadog-agent — the umbrella product: Core, Trace, Process, Security Agents + System Probe. Built from a single codebase via build tags + DI; hundreds of dependencies.
  • systems/go-compiler — Go toolchain's compilation front-end; works at package granularity, selects files by build constraints, transitively adds every import it encounters.
  • systems/go-linker — joins compiled package artifacts, performs symbol reachability analysis and dead-code elimination. Controls method DCE + respects plugin build mode.
  • systems/goda — dependency-graph tool (github.com/loov/goda); takes GOOS/GOARCH/build-tags like go list, graphs imports, supports reach(all, target) for target-reachability queries.
  • systems/go-size-analyzer — binary-size inspector (github.com/Zxilly/go-size-analyzer); text + interactive web UI with per-dependency byte costs.
  • systems/whydeadcode — tool (github.com/aarzilli/whydeadcode) that consumes go build -ldflags=-dumpdep output and names the call-chain disabling method DCE. Iterate until clean.
  • systems/containerd — container runtime; imported plugin for user-loadable plugins (a feature Datadog didn't use). Root cause of a 245-MiB regression until upstream gated it behind a build tag.
  • systems/kubernetes — surfaced twice: (1) 526 k8s.io/* packages accidentally pulled into trace-agent; (2) later adopter of the method-DCE optimization, reporting 16-37 % of its own after the Datadog-authored kubernetes/kubernetes PR landed.
  • systems/text-template / systems/html-template — stdlib templating packages; canonical real-world users of reflect.MethodByName. Datadog forked both into pkg/template/ to statically disable the method-call code path.

Concepts introduced / extended

  • concepts/binary-size-bloat (new) — growth of a compiled artifact over time as features + dependencies accumulate, without a commensurate removal discipline. Canonical wiki instance: Datadog Agent .deb 428 MiB → 1,248 MiB over 2019-2024.
  • concepts/dead-code-elimination (new) — linker's ability to drop symbols no reachable code uses. Method DCE specifically: dropping methods no code calls via a typed call site. Disabled by non-constant reflect.MethodByName or plugin-mode linking.
  • concepts/go-build-tags (new) — Go's file-level compile guard (//go:build tag); the compiler skips tag-excluded files so their imports never pull into the binary.
  • concepts/transitive-dependency-reachability (new) — the graph-theoretic reason a package's cost depends on how it's imported: any import of a package transitively includes every package it imports (in files not excluded by build tags), plus runtime. A single function's transitive import closure can be huge.
  • concepts/reflect-methodbyname-linker-pessimism (new) — specific mechanism by which reflect.MethodByName(dynamic_name) forces the linker to keep every exported method of every reachable type. Canonical real-world offenders: text/template
  • html/template.
  • concepts/go-plugin-dynamic-linking-implication (new) — importing the stdlib plugin package at all puts the linker into dynamically-linked mode, disabling method DCE AND keeping every unexported method reachable. 245 MiB cost for the Agent.

Patterns introduced / extended

  • patterns/build-tag-dependency-isolation (new) — mark a file with an unused build tag so its imports (+ their transitive deps) never end up in binaries without that tag. One of two canonical ways to prevent an unwanted dependency in Go (the other: package-split).
  • patterns/single-function-forced-package-split (new) — when a single function's imports drag a whole dep stack into binaries that don't need the rest of its package, move the function into its own package. Canonical wiki instance: the Agent trace-agent's one-function-pulls-k8s case (−570 packages, −36 MiB).
  • patterns/upstream-the-fix (extend) — Datadog's four upstream PRs (kubernetes/kubernetes, uber-go/dig, google/go-cmp, containerd/containerd) as a canonical binary-size / ecosystem instance; Kubernetes inherits the method-DCE win for free.
  • patterns/measurement-driven-micro-optimization (extend) — binary-size variant: profile with go-size-analyzer → explain with goda reach → hack-first bounding (comment out DCE-disablers to measure upper bound) → real fix.

Operational numbers

  • 5-year growth: 428 → 1,248 MiB uncompressed (+192 %), 126 → 265 MiB compressed (+110 %).
  • 6-month reduction program (v7.60.0 → v7.68.0, Dec 2024 → Jul 2025):
  • Core Agent 236 → 103 MiB (−56 %)
  • Process Agent 128 → 34 MiB (−74 %)
  • Trace Agent 90 → 23 MiB (−74 %)
  • Security Agent 152 → 35 MiB (−77 %)
  • System Probe 180 → 54 MiB (−70 %)
  • Compressed .deb 265 → 149 MiB (−44 %), uncompressed 1.22 GiB → 688 MiB (−44 %).
  • Single-function k8s fix: −570 packages, −36 MiB on trace-agent.
  • Method-DCE enable: 16-25 % per binary / ~100 MiB total; Kubernetes reports 16-37 % on its own.
  • containerd plugin unimport: −245 MiB (~20 %) on main Linux amd64 artifacts, benefiting ~75 % of users.

Caveats

  • All numbers are Linux amd64 unless otherwise noted — the amd64-vs-arm64 divergence was the whole story behind the plugin-package investigation, so per-arch results vary.
  • Forking stdlib text/template / html/template is a maintenance burden — Datadog accepted it because open Go issue #72895 to statically disable method calls is still unresolved. Not every team should fork stdlib; most can wait for the upstream fix or scope the method-DCE enable to binaries that don't use templates.
  • Method-DCE is safe when the full reflect.MethodByName call graph is audited; accidentally dropping a method that's actually called at runtime → panic. Datadog's initial idea ("instrument binaries to emit a method-use list, edit compiler artifacts to force removal of others") would have had this exact failure mode; they discarded it in favour of source-level patches that the compiler/linker can reason about statically.
  • Not all dependency bloat is accidental. Some packages genuinely need their imports. The audit's job is to separate accidental inclusions (collapse to one function) from legitimate ones (don't touch); goda reach is what makes the distinction.
  • Applies to Go binaries only. C/C++ / Rust / JVM toolchains have their own dead-code elimination stories (LTO, Proguard, panic=abort, etc.).

Source

Last updated · 200 distilled / 1,178 read