PATTERN Cited by 1 source

Signal-triggered heap snapshot-diff¶

Problem¶

You suspect a memory leak in a long-running server process in production. Restarting the process loses the leaking state. Running a full heap profiler continuously is too expensive. External tools (core dumps, gdb) are heavy and disruptive, and capture the heap at native-memory granularity — not at the Python / Ruby / language level where the leak actually lives.

Solution¶

Install a custom signal handler in the running server, then drive an on-demand profiler from a state machine across two signal deliveries:

Signal #1 → start tracing; take baseline snapshot; yield (pause the state machine until the next signal).
Signal #2 → take second snapshot; diff against baseline; dump the ranked allocation-growth list to a file; stop tracing.

Between signal #1 and signal #2 the process continues to serve normal production traffic, only paying the tracing overhead for the capture window. The diff output shows the growth between the two points in time — if the service is leaking, the leaking allocation site floats to the top of the list.

# pseudocode after Lyft's MemoryProfiler
class MemoryProfiler:
    def __init__(self):
        self._state_machine = self._profiling_state_machine()

    def register_handlers(self):
        signal.signal(signal.SIGUSR2, self.handle_signal)

    def handle_signal(self, signum, frame):
        next(self._state_machine)

    def _profiling_state_machine(self):
        while True:
            try:
                tracemalloc.start()
                snap1 = tracemalloc.take_snapshot()
                yield
                snap2 = tracemalloc.take_snapshot()
                dump_top_diff(snap2.compare_to(snap1, "lineno"))
            finally:
                if tracemalloc.is_tracing():
                    tracemalloc.stop()

The generator idiom is load-bearing — it keeps the signal handler fully side-effect-free (just next(...) the iterator) while keeping the "started tracing and took snap1" state alive across signal deliveries.

Operational workflow¶

kubectl exec (or ssh) into the target pod / host.
ps aux | grep <svc> → identify the worker PID of interest.
kill -USR2 <pid> → first signal, tracing begins.
Let the process serve traffic for T seconds (however long it takes for the leak to manifest).
kill -USR2 <pid> → second signal, diff is captured and dumped.
Retrieve the dump file; rank by allocation-growth; investigate the top N lines of Python code.

Why this shape¶

No process restart. The process is the subject of the investigation; restarting it destroys the state you want to profile.
Zero-cost when idle. Between capture windows the profiler is just a signal handler — a pointer in the kernel-managed signal-disposition table. No allocator hooks, no measurement overhead.
Targeted capture window. You pay the tracing cost only for the time you're willing to pay it; production SLOs survive.
Language-level attribution. The output is "line X of file Y grew by N bytes" — directly actionable, unlike gdb-level output.
Scales down trivially. A single engineer with kubectl exec can do this. No fleet-wide profiling infrastructure required.

Gotchas¶

Signal handlers must be registered post-fork in pre-fork servers like systems/gunicorn. Registering at import time under preload=True means the handler is installed in the leader and — depending on how the supervisor resets signals in workers — may not be active in the worker that receives kill -USR2. If absent, the default disposition for SIGUSR1 / SIGUSR2 is terminate, so the "profiling signal" kills the process. Lyft hit this exactly. See concepts/signal-handler-fork-inheritance and concepts/pre-fork-copy-on-write.
Signal-safety rules for the handler body. POSIX restricts what you can do inside a signal handler (async-signal-safe functions only). Python papers over this via PyErr_SetInterrupt — handlers run at the next bytecode boundary, not inside the signal's own execution context — but the handler body should still be trivial. The Lyft pattern keeps it to next(state_machine), which is effectively free.
Don't accidentally nest captures. If the state machine yields and a second "start" signal arrives before the "capture" signal, you leak state. The Lyft generator simply re-loops, which restarts tracing and discards snap1; alternative implementations might latch into a state variable and reject re-entry.
Allocator coverage gaps. Language-level tracers (tracemalloc in Python, ObjectSpace allocation-tracing in Ruby) see only allocations that go through the language runtime's allocator. C-extension heap (NumPy arrays, PyTorch tensors) is invisible. Pair with a native-heap tool if the leak might be below the language line.

Contrast with¶

Stack-trace sampling profiling — same philosophy ("capture in production, pay only for the capture window"), different axis (CPU, not heap). Both patterns work well side by side.
Continuous allocator hooks (jemalloc's always-on profiling) — trades zero-cost idle for zero-cost capture; useful in different operational postures.
Core-dump analysis — captures everything at one instant but at higher operational cost and without Python-level attribution.

Seen in¶

sources/2025-12-15-lyft-from-python38-to-python310-memory-leak — Lyft's MemoryProfiler is a canonical instance of the pattern, implemented on top of tracemalloc inside a gunicorn pre-fork worker. The article also serves as a cautionary tale about Step 0: where you register the signal handler.

systems/tracemalloc — the stdlib module the Python implementation typically wraps.
systems/gunicorn — the server this pattern most commonly lives inside in Python shops.
concepts/signal-handler-fork-inheritance — the registration footgun this pattern inevitably surfaces.
concepts/pre-fork-copy-on-write — why preload=True exists in the first place, which is what makes the footgun tempting.
patterns/measurement-driven-micro-optimization — the broader "capture in production, decide from data" discipline.