CONCEPT Cited by 1 source

AI thinking heartbeat¶

AI thinking heartbeat is the operational discipline of emitting a periodic "the model is still working, here is how long it's been since any output" log line while a long-running LLM request is in flight. It exists to prevent users from misreading extended-thinking latency as a hung job.

Why it matters¶

Frontier models (Claude Opus 4.7, GPT-5.4) can silently reason for minutes on hard problems before emitting the first user-visible token. In a CI pipeline or agent harness this looks indistinguishable from a hung process. Users cancel, retry, file support tickets, or assume the system is broken.

Cloudflare's framing in its AI Code Review system:

"One of the operational headaches we didn't predict was that large, advanced models like Claude Opus 4.7 or GPT-5.4 can sometimes spend quite a while thinking through a problem, and to our users this can make it look exactly like a hung job. We found that users would frequently cancel jobs and complain that the reviewer wasn't working as intended, when in reality it was working away in the background. To counter this, we added an extremely simple heartbeat log that prints 'Model is thinking... (Ns since last output)' every 30 seconds which almost entirely eliminated the problem."

Implementation shape¶

Nothing sophisticated is needed. Every N seconds (30 s in Cloudflare's case), if no new output event has arrived from the child process since the last heartbeat, emit:

Model is thinking... (Ns since last output)

The counter resets whenever the JSONL stream emits any new event. The message is purely informational — no retry, no fallback, no intervention.

Why it works¶

The heartbeat is a mental-model affordance, not a performance fix. It tells the user three things at once:

The process is alive — the parent orchestrator is running and counting.
The model is the bottleneck — specifically the LLM, not anything in the pipeline.
Nothing is wrong yet — no error, just thinking.

This is the same principle as a loading spinner, but instrumented with actual information (elapsed time since last output) so users can self-calibrate against past runs.

Generalisation¶

Any long-tail LLM workload benefits from a heartbeat:

Interactive coding agents during extended reasoning.
Batch inference jobs consumed by humans.
Judge-mode LLM evaluation scoring long trajectories.
Multi-agent orchestration where one sub-agent goes deep.

The only cost is one log line per interval. The benefit is every cancelled-then-re-run job that doesn't happen.

Seen in¶

sources/2026-04-20-cloudflare-orchestrating-ai-code-review-at-scale — named as the fix for user-cancellation complaints.

systems/cloudflare-ai-code-review — the production consumer.
concepts/jsonl-output-streaming — the transport layer the heartbeat rides alongside.
concepts/context-engineering — the broader discipline of shaping what a user (or a model) perceives during a long operation.