SYSTEM Cited by 1 source
tracemalloc¶
Summary¶
tracemalloc is a CPython standard library
module (since Python 3.4) that tracks memory allocations made by
the Python runtime and attributes each allocation to a Python
traceback. It is the canonical tool for finding the Python-level
origin of a memory leak without attaching an external profiler or
recompiling CPython.
Mechanism¶
- Hooks into CPython's
PyMem_*allocators whentracemalloc.start()is called. - For every allocation, records the allocating traceback at configurable depth (default 1 frame, tunable up).
- Keeps a per-traceback size + count map in memory; the runtime
cost of tracing is paid only while
tracemalloc.is_tracing(). - Does not see allocations that bypass CPython's allocator
(raw
mallocinside C extensions that do not go throughPyMem_*) — NumPy buffers, for example, are a partial blind spot unless the extension opts in.
Snapshot + diff model¶
The primary API shape is:
import tracemalloc
tracemalloc.start()
# … run some workload …
snap1 = tracemalloc.take_snapshot()
# … run some more workload …
snap2 = tracemalloc.take_snapshot()
top = snap2.compare_to(snap1, 'lineno')
for stat in top[:25]:
print(stat)
tracemalloc.stop()
compare_to returns a ranked list of per-file / per-line allocation
deltas — "the top 25 lines of Python code that grew between snap1
and snap2". In a leak hunt this is usually enough to point at the
culprit.
Why it's production-useful¶
- Stdlib-only — no dependency to vendor into the image.
- No process restart — can be toggled on and off in a running server to capture a window.
- Python-level attribution — tracebacks are to user code, not
to
PyObject_Mallocaddresses, so the output is directly actionable. - Signal-driven — fits naturally into a signal-triggered snapshot-diff pattern where a worker stays in no-tracing mode by default and only pays the cost during the capture window.
Not-a-replacement-for¶
- Sampling profilers like eBPF / py-spy / Datadog Continuous
Profiler capture CPU with near-zero overhead across the whole
fleet;
tracemalloccaptures heap for a targeted window on a single process. Different axis, complementary role. See concepts/stack-trace-sampling-profiling for the CPU analogue. - Native-heap profilers (jemalloc's heap profiling, memray, gdb
heap dumps) see C-level allocations
tracemalloccannot.tracemallocis the right first tool for Python-level leaks and the wrong tool for NumPy / PyTorch / C-extension leaks.
Lyft's signal-driven wrapper¶
Lyft (Lyft) built an on-demand profiler on top of
tracemalloc for a gunicorn service:
- A custom
MemoryProfilerclass wraps a generator-based state machine. SIGUSR2(first) →tracemalloc.start()+snap1 = take_snapshot()+yield.SIGUSR2(second) →snap2 = take_snapshot(),snap2.compare_to( snap1), dump the top deltas to a file,tracemalloc.stop().- Operator workflow:
kubectl exec→ps aux→kill -USR2 <pid>at interval T1 and again at interval T2; retrieve the dumped file for analysis.
The wrapper is simple enough to vendor into a service in <100 lines;
the interesting engineering is not in the wrapper but in getting it
to run at all under gunicorn's preload=True
(concepts/signal-handler-fork-inheritance).
Related¶
- systems/gunicorn — the pre-fork server tracemalloc is most often embedded in for Python services.
- patterns/signal-triggered-heap-snapshot-diff — the idiomatic way to wire tracemalloc into a long-running HTTP worker.
- concepts/stack-trace-sampling-profiling — the complementary CPU-side technique; same "sample in production" ethos, different axis.
Seen in¶
- sources/2025-12-15-lyft-from-python38-to-python310-memory-leak
— Lyft's wrapper is a canonical instance of signal-driven
tracemallocon a production gunicorn worker.