CONCEPT Cited by 1 source
Context window as token budget¶
Definition¶
The context window supplied to an LLM call is a fixed token budget. Every input the program keeps in that window — user messages, assistant replies, tool descriptions (their JSON schemas), tool outputs, system prompts, summaries — competes for the same limited token space. Past a threshold, "the whole system begins getting nondeterministically stupider." (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)
The Fly.io post's canonical phrasing: "You're allotted a fixed number of tokens in any context window. Each input you feed in, each output you save, each tool you describe, and each tool output eats tokens (that is: takes up space in the array of strings you keep to pretend you're having a conversation with a stateless black box)."
Why the framing matters¶
Three practical consequences follow from treating the window as a budget rather than as a memory:
- Tool descriptions are a line item. Every tool you expose sits in context on every turn, costing tokens before any tool is actually called. A bristling inventory of 50 tool schemas can leave no room to get work done — Fly.io names this directly as the driver of the tool-selection-accuracy + tool-surface-minimization discipline. Datadog's MCP-server retrospective independently confirmed the same failure mode (Source: sources/2026-03-04-datadog-mcp-server-agent-tools).
- Tool outputs eat budget too. A large tool output (a dumped log file, a stack trace, a search-result JSON blob) pushed into context stays there for the rest of the session. This motivates summarisation-as-compression, output-to-file-not-to-context (patterns/untrusted-input-via-file-not-prompt at Datadog was partly motivated by the same budget concern), and per-agent context isolation via sub-agents (patterns/context-segregated-sub-agents).
- The degradation is nondeterministic. There is no hard cliff at exactly N tokens. Quality degrades as more tokens pile up, and the failure mode looks like the model getting "stupider" — missing instructions, ignoring earlier context, hallucinating facts it was told. See concepts/context-rot for the related observation that accuracy decays well before the stated window limit.
Programming implications¶
Treating context as a budget turns "context engineering" into a legible programming problem (concepts/context-engineering):
- Allocate explicitly. Decide up-front how many tokens go to system prompts, tool schemas, conversation history, and tool-output headroom. When the budget runs tight, compress the oldest slice (summarise a sub-conversation into a paragraph).
- Keep the array small. The "conversation" is a Python list of strings (concepts/agent-loop-stateless-llm); you can filter, compress, re-order, truncate, and splice in summaries — it's just data.
- Split contexts. Spawn a sub-agent with its own fresh context array for work that doesn't need to sit in the main agent's budget. Return a summary up, not the raw transcript.
- Trim tool descriptions to what this turn needs. Some agents swap tool allowlists between planning and execution phases so only the relevant subset is in-window at a time.
Seen in¶
- Fly.io, You Should Write An Agent (2025-11-06) — canonical statement of the context-window-as-token-budget framing + the "nondeterministically stupider" degradation observation + sub-agents as the natural composition primitive falling out of treating context as an array. (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)
- Dropbox Dash (2025-11-17) — tool schemas explicitly costed as a concepts/agent-context-window line item; accuracy degraded on longer jobs (concepts/context-rot).
- Datadog MCP server (2026-03-04) — tool-selection accuracy degrades with inventory growth; cited arXiv 2411.15399 + patterns/tool-surface-minimization.