Skip to content

CONCEPT Cited by 1 source

AI thinking heartbeat

AI thinking heartbeat is the operational discipline of emitting a periodic "the model is still working, here is how long it's been since any output" log line while a long-running LLM request is in flight. It exists to prevent users from misreading extended-thinking latency as a hung job.

Why it matters

Frontier models (Claude Opus 4.7, GPT-5.4) can silently reason for minutes on hard problems before emitting the first user-visible token. In a CI pipeline or agent harness this looks indistinguishable from a hung process. Users cancel, retry, file support tickets, or assume the system is broken.

Cloudflare's framing in its AI Code Review system:

"One of the operational headaches we didn't predict was that large, advanced models like Claude Opus 4.7 or GPT-5.4 can sometimes spend quite a while thinking through a problem, and to our users this can make it look exactly like a hung job. We found that users would frequently cancel jobs and complain that the reviewer wasn't working as intended, when in reality it was working away in the background. To counter this, we added an extremely simple heartbeat log that prints 'Model is thinking... (Ns since last output)' every 30 seconds which almost entirely eliminated the problem."

Implementation shape

Nothing sophisticated is needed. Every N seconds (30 s in Cloudflare's case), if no new output event has arrived from the child process since the last heartbeat, emit:

Model is thinking... (Ns since last output)

The counter resets whenever the JSONL stream emits any new event. The message is purely informational — no retry, no fallback, no intervention.

Why it works

The heartbeat is a mental-model affordance, not a performance fix. It tells the user three things at once:

  1. The process is alive — the parent orchestrator is running and counting.
  2. The model is the bottleneck — specifically the LLM, not anything in the pipeline.
  3. Nothing is wrong yet — no error, just thinking.

This is the same principle as a loading spinner, but instrumented with actual information (elapsed time since last output) so users can self-calibrate against past runs.

Generalisation

Any long-tail LLM workload benefits from a heartbeat:

  • Interactive coding agents during extended reasoning.
  • Batch inference jobs consumed by humans.
  • Judge-mode LLM evaluation scoring long trajectories.
  • Multi-agent orchestration where one sub-agent goes deep.

The only cost is one log line per interval. The benefit is every cancelled-then-re-run job that doesn't happen.

Seen in

Last updated · 200 distilled / 1,178 read