Skip to content

CONCEPT Cited by 3 sources

Agent context window

The agent context window is the fixed-size LLM working set into which an agent session's system prompt, tool descriptions, per-turn messages, and all tool-call results are concatenated. For tool-server designers, it is the dominant scarce resource: the server owns how much of it each tool call consumes (Source: sources/2026-03-04-datadog-mcp-server-agent-tools).

Why this is load-bearing, not an ergonomic footnote

Failure modes observed when the budget is ignored (Datadog MCP V1, which was a thin API wrapper):

  1. Context fills with raw data. Agents retrieve logs or traces and lose track of the task they were solving.
  2. Variable-size records blow the budget unpredictably. Log messages range from 100 B to 1 MB at Datadog; a fixed record-count page is effectively unbounded in tokens. Paginate by token budget instead (patterns/token-budget-pagination).
  3. Trend inference from raw samples. Agents asked "which services log the most errors?" pull samples and guess; some brute-force until the window is full. Expose a query language so aggregation happens server-side (patterns/query-language-as-agent-tool).
  4. Tool descriptions themselves consume the budget. Every exposed tool's description is resident in context for every turn; this bounds tool inventory as tightly as tool-calling accuracy does (patterns/tool-surface-minimization).

Format-level levers (same semantics, fewer tokens)

Applied once, these compound:

  • CSV over JSON for tabular data: Datadog reports ~half the tokens per record for tabular shapes with no nesting. For nested data, YAML over JSON saves ~20%. Same data, different tokenizer surface.
  • Default-trim rarely-used fields, let the agent re-request them. Post quotes fitting "~5× more records in the same number of tokens" on some tools after combining format + trim.

Adjacent levers (not format)

Decaying constraint?

Some clients are experimenting with writing long tool results to disk rather than inlining them into the window (Cursor's dynamic context discovery; Claude Code). If that lands in the MCP spec and becomes widespread, format-level token efficiency will matter less. Structural levers (query language, tool-surface minimisation, actionable errors) do not decay — an agent still has to reason over the token-bearing artifacts.

Seen in

  • sources/2026-03-04-datadog-mcp-server-agent-tools — named as the design-organizing resource for Datadog's MCP server redesign; five production patterns derived from treating it as a budget.
  • sources/2025-11-17-dropbox-how-dash-uses-context-engineering-for-smarter-ai — Dropbox Dash's context-engineering architecture is framed around the same budget concern, with three additional observed symptoms beyond the Datadog list: (1) tool-count growth → "analysis paralysis" (concepts/tool-selection-accuracy); (2) long-running jobs degrade (explicit concepts/context-rot observation); (3) per-tool description + parameter schema are a resident cost that scales linearly in integration count. Dash's answer is structural: patterns/unified-retrieval-tool collapses N per-app retrieval tools into one, and patterns/specialized-agent-decomposition hides a complex tool's own instructions behind a sub-agent so the parent's budget isn't starved.
  • sources/2026-01-28-dropbox-knowledge-graphs-mcp-dspy-dash — Clemm's talk puts a numeric target on the budget: Dash caps its context at ~100,000 tokens; "those tool definitions do fill up quickly." Adds two new levers not previously named on this page: (a) Knowledge-graph bundles as token-efficiency device — "modeling data within knowledge graphs can significantly cut our token usage, because you're really just getting the most relevant information for the query" — i.e., pre-ranked graph-derived digests enter context instead of raw retrieval fan-out. (b) Store tool results locally, not in the context window — "tool results come back with a huge amount of context, so we actually choose to store that locally. We do not put that in the LLM context window." The agent references tool results by handle rather than by inlining. Also quantifies the cost of ignoring the budget: ~45s for a simple query via MCP vs "within seconds" against the raw index.
Last updated · 200 distilled / 1,178 read