Skip to content

CONCEPT Cited by 2 sources

Context rot

Context rot is the empirically observed degradation of LLM agent accuracy as the context window fills up over a long-running investigation — especially when much of the added content is tool-call noise rather than task-relevant signal. The term was popularized by TryChroma's research and cited in Dropbox's context-engineering post as the forcing function that motivated Dash's architectural redesign (Source: sources/2025-11-17-dropbox-how-dash-uses-context-engineering-for-smarter-ai).

Concrete observation (Dash)

"We noticed that the overall accuracy of Dash degraded for longer-running jobs. The tool calls were adding a lot of extra context. We were seeing similar patterns of what's been popularized as context rot."

The failure mode isn't a single hard edge (like exceeding the window). It's a gradient: accuracy slips as accumulated context grows, well before the token limit is hit.

Why it happens (hypotheses)

  • Attention dilution. More tokens → each individual relevant token gets less attention weight.
  • Distractor accumulation. Each extraneous tool output is a decoy the model can latch onto.
  • Plan fragmentation. Long chains of tool calls mean the original intent is many turns back in the context; the model drifts.
  • Format noise. JSON-fat tool output consumes token budget and confuses the model more than equivalent CSV/YAML.

Mitigations observed in this wiki

The antidote is an architectural discipline (concepts/context-engineering) across several axes:

Decaying constraint?

Some clients are experimenting with writing long tool results to disk rather than inlining them into the window (Cursor's dynamic context discovery, Claude Code's recent changes cited in the Datadog post). If that becomes widespread the format-level token efficiency matters less, but structural levers (fewer tools, better ranking, specialized agents) don't decay — an agent still has to reason over the token-bearing artifacts.

Seen in

  • sources/2025-11-17-dropbox-how-dash-uses-context-engineering-for-smarter-ai — named as the observed failure mode for Dash on long-running jobs; forcing function for context-engineering adoption.
  • sources/2026-01-28-dropbox-knowledge-graphs-mcp-dspy-dash — Clemm's companion talk quantifies and reframes the same failure mode on the MCP side: "you're getting a lot of content back, and you're immediately going to fill up that context window. It's going to be very problematic. It's also incredibly slow. So, if you're using MCP with some agents today, even a simple query can take up to 45 seconds." The post also adds a new mitigation lever not previously in this wiki: store tool results locally, not in the context window (let the agent reference them by handle rather than inline them into the budget).
  • TryChroma research — source of the term as used in the industry.
Last updated · 200 distilled / 1,178 read