CONCEPT Cited by 1 source

Agent hyperfixation failure mode¶

Definition¶

Agent hyperfixation is the failure mode where an LLM-based coding agent commits to the first hypothesis it generates and works to force that hypothesis to succeed, rather than stepping back and reconsidering the problem abstractly — even when the agent's own chat log shows it verbalising the need to reconsider.

Observable anecdotally across unsupervised agent sessions; distinct from hallucination in that hyperfixation is about reasoning-path commitment rather than factual fabrication.

Canonical framing¶

From Anthony Shew's 2026-04-21 Turborepo performance post verbatim, reviewing the output of 8 unattended overnight coding agents on Rust-performance tasks:

"The agent would hyperfixate on the first idea that it came up with and force it to work, rather than backing up and thinking abstractly about the problem (even though the chat logs showed it trying to do so)."

The parenthetical is the load-bearing observation: the agent knows it should reconsider — verbalises the need out loud — and doesn't actually reconsider. The chat log and the action sequence diverge.

Shew's review of the 8-agent phone-spawn experiment named five agent failure modes. Hyperfixation is #2; the five in order:

No dogfood-loop awareness — Turborepo builds Turborepo; the agent could have built a binary and run it on the source code for end-to-end validation, but none did.
Hyperfixation on first hypothesis — the subject of this concept.
Microbenchmark-vs-end-to-end gap — agent chases the biggest microbenchmark number available; crafts a 97 % microbench win that amounts to 0.02 % end-to-end.
No regression tests written.
No --profile flag usage — the agent proposed hashing optimisations without ever profiling the actual binary.

The five form a profile of unattended-agent pathology at current model+harness dependability.

Mitigation: Plan Mode¶

The post's mitigation is separating proposal from execution. In Plan-Mode-then-implement:

Agent runs in Plan Mode, producing a hypothesis about what to optimise from the Markdown profile.
Human reviews the proposal and decides whether the hypothesis is worth pursuing, whether a different proposal is better, whether the agent is hyperfixating on the wrong hot path.
Only approved proposals go to implementation.

Hyperfixation is filtered at the human-gate step — an engineer reading the proposal notices "you're proposing to micro-optimise X but the profile shows Y is the actual hot path."

Distinct from confirmation bias¶

Confirmation bias is selecting evidence to support a pre-existing conclusion. Hyperfixation is different: the agent's conclusion is the first one it generated (often in response to recent tokens in the prompt), and it doesn't re-examine the conclusion even when presented with evidence against it. The mechanism is closer to anchoring on first-generated token sequences than to motivated reasoning.

Observable signal¶

Chat log verbalises doubt ("let me step back and think about this differently") but the next tool call re-applies the same pattern.
Repeated attempts to make one approach work across many tool calls, even after early failures.
No exploration of alternative approaches despite the prompt or evidence suggesting alternatives exist.

Limits of the observation¶

Anecdotal. 8-agent sample size; one engineer's review; no control comparison with human-engineer first-hypothesis-commitment rates.
Harness-dependent. Different agent harnesses (Claude Code vs Cursor vs Codex vs Aider) may have different propensities for hyperfixation depending on context-window strategy, tool-call policy, and retry budgets.
Model-dependent. Different models have different propensities; the post doesn't name which model exhibited the behaviour.
May be partly a prompt-ambiguity problem. Shew's own retrospective: "my prompts didn't specify which hot path, did they?" — agents without clearly specified targets have more room to hyperfixate on wrong ones.

Seen in¶

sources/2026-04-21-vercel-making-turborepo-96-faster-with-agents-sandboxes-and-humans — definitional source; one of five named agent failure modes observed across the 8-agent unattended performance-engineering experiment. Motivates the supervised Plan-Mode-then-implement loop.

concepts/microbenchmark-vs-end-to-end-gap — sibling failure mode from the same 5-pathology enumeration.
patterns/plan-mode-then-implement-agent-loop — canonical mitigation pattern.
patterns/agent-spawn-parallel-exploration — alternative mitigation: fan out N agents with prompt variations so different first-hypotheses get tried in parallel.
concepts/llm-hallucination — distinct failure mode; hyperfixation is about reasoning-path commitment, not factual fabrication.
concepts/agent-identity-resolution-gap — sibling named-agent-failure-mode canonicalisation from Stripe's 2026-03-12 agentic-commerce post.