Skip to content

SYSTEM Cited by 1 source

tracemalloc

Summary

tracemalloc is a CPython standard library module (since Python 3.4) that tracks memory allocations made by the Python runtime and attributes each allocation to a Python traceback. It is the canonical tool for finding the Python-level origin of a memory leak without attaching an external profiler or recompiling CPython.

Mechanism

  • Hooks into CPython's PyMem_* allocators when tracemalloc.start() is called.
  • For every allocation, records the allocating traceback at configurable depth (default 1 frame, tunable up).
  • Keeps a per-traceback size + count map in memory; the runtime cost of tracing is paid only while tracemalloc.is_tracing().
  • Does not see allocations that bypass CPython's allocator (raw malloc inside C extensions that do not go through PyMem_*) — NumPy buffers, for example, are a partial blind spot unless the extension opts in.

Snapshot + diff model

The primary API shape is:

import tracemalloc
tracemalloc.start()

# … run some workload …
snap1 = tracemalloc.take_snapshot()

# … run some more workload …
snap2 = tracemalloc.take_snapshot()

top = snap2.compare_to(snap1, 'lineno')
for stat in top[:25]:
    print(stat)

tracemalloc.stop()

compare_to returns a ranked list of per-file / per-line allocation deltas — "the top 25 lines of Python code that grew between snap1 and snap2". In a leak hunt this is usually enough to point at the culprit.

Why it's production-useful

  • Stdlib-only — no dependency to vendor into the image.
  • No process restart — can be toggled on and off in a running server to capture a window.
  • Python-level attribution — tracebacks are to user code, not to PyObject_Malloc addresses, so the output is directly actionable.
  • Signal-driven — fits naturally into a signal-triggered snapshot-diff pattern where a worker stays in no-tracing mode by default and only pays the cost during the capture window.

Not-a-replacement-for

  • Sampling profilers like eBPF / py-spy / Datadog Continuous Profiler capture CPU with near-zero overhead across the whole fleet; tracemalloc captures heap for a targeted window on a single process. Different axis, complementary role. See concepts/stack-trace-sampling-profiling for the CPU analogue.
  • Native-heap profilers (jemalloc's heap profiling, memray, gdb heap dumps) see C-level allocations tracemalloc cannot. tracemalloc is the right first tool for Python-level leaks and the wrong tool for NumPy / PyTorch / C-extension leaks.

Lyft's signal-driven wrapper

Lyft (Lyft) built an on-demand profiler on top of tracemalloc for a gunicorn service:

  • A custom MemoryProfiler class wraps a generator-based state machine.
  • SIGUSR2 (first) → tracemalloc.start() + snap1 = take_snapshot() + yield.
  • SIGUSR2 (second) → snap2 = take_snapshot(), snap2.compare_to( snap1), dump the top deltas to a file, tracemalloc.stop().
  • Operator workflow: kubectl execps auxkill -USR2 <pid> at interval T1 and again at interval T2; retrieve the dumped file for analysis.

The wrapper is simple enough to vendor into a service in <100 lines; the interesting engineering is not in the wrapper but in getting it to run at all under gunicorn's preload=True (concepts/signal-handler-fork-inheritance).

Seen in

Last updated · 319 distilled / 1,201 read