SYSTEM Cited by 1 source

tracemalloc¶

Summary¶

tracemalloc is a CPython standard library module (since Python 3.4) that tracks memory allocations made by the Python runtime and attributes each allocation to a Python traceback. It is the canonical tool for finding the Python-level origin of a memory leak without attaching an external profiler or recompiling CPython.

Mechanism¶

Hooks into CPython's PyMem_* allocators when tracemalloc.start() is called.
For every allocation, records the allocating traceback at configurable depth (default 1 frame, tunable up).
Keeps a per-traceback size + count map in memory; the runtime cost of tracing is paid only while tracemalloc.is_tracing().
Does not see allocations that bypass CPython's allocator (raw malloc inside C extensions that do not go through PyMem_*) — NumPy buffers, for example, are a partial blind spot unless the extension opts in.

Snapshot + diff model¶

The primary API shape is:

import tracemalloc
tracemalloc.start()

# … run some workload …
snap1 = tracemalloc.take_snapshot()

# … run some more workload …
snap2 = tracemalloc.take_snapshot()

top = snap2.compare_to(snap1, 'lineno')
for stat in top[:25]:
    print(stat)

tracemalloc.stop()

compare_to returns a ranked list of per-file / per-line allocation deltas — "the top 25 lines of Python code that grew between snap1 and snap2". In a leak hunt this is usually enough to point at the culprit.

Why it's production-useful¶

Stdlib-only — no dependency to vendor into the image.
No process restart — can be toggled on and off in a running server to capture a window.
Python-level attribution — tracebacks are to user code, not to PyObject_Malloc addresses, so the output is directly actionable.
Signal-driven — fits naturally into a signal-triggered snapshot-diff pattern where a worker stays in no-tracing mode by default and only pays the cost during the capture window.

Not-a-replacement-for¶

Sampling profilers like eBPF / py-spy / Datadog Continuous Profiler capture CPU with near-zero overhead across the whole fleet; tracemalloc captures heap for a targeted window on a single process. Different axis, complementary role. See concepts/stack-trace-sampling-profiling for the CPU analogue.
Native-heap profilers (jemalloc's heap profiling, memray, gdb heap dumps) see C-level allocations tracemalloc cannot. tracemalloc is the right first tool for Python-level leaks and the wrong tool for NumPy / PyTorch / C-extension leaks.

Lyft's signal-driven wrapper¶

Lyft (Lyft) built an on-demand profiler on top of tracemalloc for a gunicorn service:

A custom MemoryProfiler class wraps a generator-based state machine.
SIGUSR2 (first) → tracemalloc.start() + snap1 = take_snapshot() + yield.
SIGUSR2 (second) → snap2 = take_snapshot(), snap2.compare_to( snap1), dump the top deltas to a file, tracemalloc.stop().
Operator workflow: kubectl exec → ps aux → kill -USR2 <pid> at interval T1 and again at interval T2; retrieve the dumped file for analysis.

The wrapper is simple enough to vendor into a service in <100 lines; the interesting engineering is not in the wrapper but in getting it to run at all under gunicorn's preload=True (concepts/signal-handler-fork-inheritance).

systems/gunicorn — the pre-fork server tracemalloc is most often embedded in for Python services.
patterns/signal-triggered-heap-snapshot-diff — the idiomatic way to wire tracemalloc into a long-running HTTP worker.
concepts/stack-trace-sampling-profiling — the complementary CPU-side technique; same "sample in production" ethos, different axis.

Seen in¶

sources/2025-12-15-lyft-from-python38-to-python310-memory-leak — Lyft's wrapper is a canonical instance of signal-driven tracemalloc on a production gunicorn worker.