CONCEPT Cited by 2 sources

Context window as token budget¶

Definition¶

The context window supplied to an LLM call is a fixed token budget. Every input the program keeps in that window — user messages, assistant replies, tool descriptions (their JSON schemas), tool outputs, system prompts, summaries — competes for the same limited token space. Past a threshold, "the whole system begins getting nondeterministically stupider." (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)

The Fly.io post's canonical phrasing: "You're allotted a fixed number of tokens in any context window. Each input you feed in, each output you save, each tool you describe, and each tool output eats tokens (that is: takes up space in the array of strings you keep to pretend you're having a conversation with a stateless black box)."

Why the framing matters¶

Three practical consequences follow from treating the window as a budget rather than as a memory:

Tool descriptions are a line item. Every tool you expose sits in context on every turn, costing tokens before any tool is actually called. A bristling inventory of 50 tool schemas can leave no room to get work done — Fly.io names this directly as the driver of the tool-selection-accuracy + tool-surface-minimization discipline. Datadog's MCP-server retrospective independently confirmed the same failure mode (Source: sources/2026-03-04-datadog-mcp-server-agent-tools).
Tool outputs eat budget too. A large tool output (a dumped log file, a stack trace, a search-result JSON blob) pushed into context stays there for the rest of the session. This motivates summarisation-as-compression, output-to-file-not-to-context (patterns/untrusted-input-via-file-not-prompt at Datadog was partly motivated by the same budget concern), and per-agent context isolation via sub-agents (patterns/context-segregated-sub-agents).
The degradation is nondeterministic. There is no hard cliff at exactly N tokens. Quality degrades as more tokens pile up, and the failure mode looks like the model getting "stupider" — missing instructions, ignoring earlier context, hallucinating facts it was told. See concepts/context-rot for the related observation that accuracy decays well before the stated window limit.

Programming implications¶

Treating context as a budget turns "context engineering" into a legible programming problem (concepts/context-engineering):

Allocate explicitly. Decide up-front how many tokens go to system prompts, tool schemas, conversation history, and tool-output headroom. When the budget runs tight, compress the oldest slice (summarise a sub-conversation into a paragraph).
Keep the array small. The "conversation" is a Python list of strings (concepts/agent-loop-stateless-llm); you can filter, compress, re-order, truncate, and splice in summaries — it's just data.
Split contexts. Spawn a sub-agent with its own fresh context array for work that doesn't need to sit in the main agent's budget. Return a summary up, not the raw transcript.
Trim tool descriptions to what this turn needs. Some agents swap tool allowlists between planning and execution phases so only the relevant subset is in-window at a time.

Seen in¶

Fly.io, You Should Write An Agent (2025-11-06) — canonical statement of the context-window-as-token-budget framing + the "nondeterministically stupider" degradation observation + sub-agents as the natural composition primitive falling out of treating context as an array. (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)
Dropbox Dash (2025-11-17) — tool schemas explicitly costed as a concepts/agent-context-window line item; accuracy degraded on longer jobs (concepts/context-rot).
Datadog MCP server (2026-03-04) — tool-selection accuracy degrades with inventory growth; cited arXiv 2411.15399 + patterns/tool-surface-minimization.
Atlassian Long Horizon (2026-06-18) — single-loop 150-iteration agent drives three context-budget techniques: context compaction (trim/summarise older tool outputs before each call), progressive tool disclosure (pay schema cost only for tools actually used), and prompt layer ordering (most-stable-to-most-volatile for prefix-cache maximisation). (Source: sources/2026-06-18-atlassian-long-horizon-reasoning-engine.)

Context window as token budget¶

Definition¶

Why the framing matters¶

Programming implications¶

Seen in¶

Related¶