Skip to content

CONCEPT Cited by 1 source

Go runtime memory model (virtual vs resident)

Definition

In Go (and in most managed-memory runtimes) the virtual memory the runtime has reserved from the OS and the resident set size (RSS) — the physical RAM actually committed to back it — are two different numbers that can diverge substantially. Go's runtime/metrics package reports the virtual view: heap-in-use, heap-released, GC-cycle counters, etc. The OS and Kubernetes's memory limits / OOM killer care about RSS. A regression that converts previously-uncommitted virtual pages into committed physical pages will show up in RSS but not in runtime/metrics at all.

The three-layer memory picture

┌──────────────────────────────────────────────────────┐
│  Go runtime counters (runtime/metrics, pprof heap)    │  ← what Go thinks it's using
│  "heap in use" = live objects managed by the allocator│
├──────────────────────────────────────────────────────┤
│  Virtual memory (mmap'd regions, /proc/[pid]/maps)    │  ← what Go has reserved
│  "Size" of each VMA; can exceed RSS                   │
├──────────────────────────────────────────────────────┤
│  Resident physical memory (/proc/[pid]/smaps "Rss")   │  ← what RAM is actually holding
│  Driven by first-touch page faults                    │
└──────────────────────────────────────────────────────┘

A page is in the virtual-but-not-resident zone when Go (or its allocator) has reserved an mmap region but the program has never stored to those pages. Linux serves them lazily via the page-fault path on first write, committing them to RAM at that moment.

Why the divergence matters operationally

Kubernetes pod memory limits, the Linux OOM killer, cgroup accounting, and most cloud monitoring use RSS (or close cousins like WSS / PSS). Go's own heap profiles and runtime/metrics use the virtual accounting. Consequences:

  • A service can OOM-kill on Kubernetes with "plenty of heap headroom" per runtime/metrics.
  • A process can show stable Go heap numbers while system dashboards show memory climbing — the program is not leaking, but pages are being committed that previously were not.
  • Rolling a runtime upgrade that changes when pages get committed (without changing what the program holds live) looks invisible from inside the program.

This is the operational instantiation of the concepts/monitoring-paradox mental shape: the layer you trust to tell you about memory is not the layer that gets you OOM-killed.

Diagnosing via /proc/[pid]/smaps

smaps breaks each VMA into Size (virtual allocation) and Rss (committed). For Go services, the first large r/w anonymous mapping near the executable base is typically the Go heap. Comparing Size and Rss for that region isolates "did a specific region start committing more pages?" from "did the program hold more live data?"

Worked example (Datadog, Go 1.24 rollout): - Go 1.23: Go-heap VMA Size: 1.33 GiB, Rss: 1.04 GiB (~300 MiB uncommitted) - Go 1.24: Go-heap VMA Size: 1.28 GiB, Rss: 1.26 GiB (near-full commit) - Other VMAs (stacks, mmap'd files, cgo): unchanged

Upstream change: Lénaïc Huard's CL 646095 labels Go-allocated regions in maps/smaps, so "which VMA is the Go heap?" becomes a lookup instead of a heuristic.

The zeroing / commit relationship

The mechanism behind "Go runtime says stable, RSS says up" is almost always unnecessary page touches forcing commits:

  • Go requests a large span from the OS via mmap. The kernel reserves virtual address space; no physical pages yet.
  • Go hands out an allocation carved from that span.
  • If Go writes zeros over those pages (e.g., memclr on alloc), each write first-touches a page, triggering a fault → physical page assigned → RSS up.
  • If Go skips zeroing because the OS-fresh pages are already zero (the pre-1.24 optimization for large pointer-bearing allocations), those pages stay virtual until the program itself writes to them.

The Go 1.24 regression was precisely the loss of that skip. The cause was a runtime refactor, not a change in what the program allocated. The allocation counters (virtual) stay flat; the RSS counter rises as "never-touched" pages become "touched-to-zero" pages.

This is why the same allocation pattern under the same workload can produce materially different RSS before and after a runtime upgrade.

Other sources of virtual/resident divergence in Go

  • Returned-but-not-unmapped memory: Go's allocator uses madvise(MADV_FREE) / MADV_DONTNEED to tell the kernel pages can be reclaimed, but the VMA is kept. RSS drops lazily as the kernel decides to reclaim.
  • GC pacer choices: Go's GC sometimes holds onto heap space rather than returning it, so a GC cycle that frees objects reduces "heap in use" but does not necessarily reduce RSS.
  • Goroutine stacks: stack memory is mmap'd per-goroutine; touched pages are resident until goroutines exit and stacks are reaped.
  • Cgo / C allocator: malloc() from cgo bypasses Go's allocator and runtime/metrics; its footprint shows in RSS only.

Implications for production services

  • Don't trust runtime/metrics alone for capacity planning. Set Kubernetes memory limits with RSS headroom, not "heap in use" headroom.
  • Keep smaps in the debugging toolkit, not just pprof. VMA-level deltas localize regressions that runtime/metrics cannot see.
  • Treat toolchain upgrades as workload-shape-dependent changes, not "upgrade and forget." The same Go version can improve one env and regress another depending on allocation shape — see Datadog's case where the highest-traffic env gained ~20% virtual / ~12% RSS (Swiss Tables win) while lower-traffic envs lost ~20% RSS (mallocgc regression).
  • Bisect upgrades on staging before production. A runtime/metrics-invisible regression can only be caught by system-level observability (RSS, OOM rate) at a scale where the problem actually manifests.

Seen in

  • sources/2025-07-17-datadog-go-124-memory-regression — Go 1.24 mallocgc refactor silently removed a "skip zeroing for OS-fresh large pointer-bearing allocations" optimization; regression invisible to runtime/metrics, visible in /proc/[pid]/smaps as the Go-heap VMA's Rss converging toward its Size; ~20% RSS increase across a Datadog data-processing fleet; fix (CL 659956) ships in Go 1.25.
Last updated · 200 distilled / 1,178 read