CONCEPT Cited by 1 source
strace debugging¶
strace debugging is the practice of attaching the strace utility to a running process to record its syscall interactions with the kernel. It reveals what the process actually did at the OS boundary, as opposed to what application-level logging and tracing think happened.
When Application Observability Fails¶
Application-level observability (structured logging, distributed tracing, metrics) operates within the application's model of the world. When a bug lives below that model — in a library's I/O handling, in kernel buffer behavior, or in timing between syscalls — application instrumentation reports "everything is fine" while data is silently lost.
strace bypasses the application entirely, recording the raw syscall sequence: sendto, recvfrom, shutdown, close, and their return values.
The Heisenbug Problem¶
strace intercepts each syscall, adding timing overhead. For timing-sensitive bugs, this overhead can:
- Mask the bug if the filter is too broad (slowing the process enough to shift the race window).
- Preserve the bug if the filter is narrow (only tracing the specific syscalls of interest).
Cloudflare found that broadening their strace filter made the hyper race condition disappear. Only a narrow filter (sendto, shutdown on the specific fd) kept overhead low enough to trigger the failure (Source: sources/2026-06-22-cloudflare-how-we-found-a-bug-in-the-hyper-http-library).
Key Signals from strace¶
In Cloudflare's investigation, strace revealed:
- Successful request: multiple sendto calls draining the buffer, followed by shutdown.
- Failing request: one sendto (219 KB), immediately followed by shutdown — no further writes.
This was the definitive evidence that the Images service was shutting down the connection prematurely, not that the client was disconnecting.
Seen In¶
- sources/2026-06-22-cloudflare-how-we-found-a-bug-in-the-hyper-http-library — strace was the breakthrough tool that revealed premature shutdown in hyper