CONCEPT Cited by 7 sources
Context engineering¶
Definition¶
Context engineering is the discipline of allocating a fixed token budget across the components that compete for the LLM's context window — system prompts, tool descriptions, conversation history, tool outputs, summaries, retrieved context — so the model receives the right shape of information to do useful work without running out of budget.
Unlike "prompt engineering" (the earlier framing Fly.io dismisses as "magic spells"), context engineering is a legible programming problem with named levers:
- How many tokens are budgeted to tool descriptions vs. history vs. headroom for tool outputs?
- How are old conversation slices compressed or elided when they no longer fit?
- Which tools are visible to the model this turn, and which are deferred to a sub-agent with its own context?
- When a tool returns a large blob (a log, a file, a search result), does it go into the main context or into a side buffer the agent can reference by handle?
Fly.io's framing:
"Just like you, I rolled my eyes when 'Prompt Engineering' turned into 'Context Engineering'. Then I wrote an agent. Turns out: context engineering is a straightforwardly legible programming problem. […] If Context Engineering was an [Advent of Code problem], it'd occur mid-December. It's programming." (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)
Named techniques (as of 2025-2026)¶
- Sub-agent decomposition. Spawn a child agent with its own context array + tool allowlist; return a summary up rather than the child's raw transcript (patterns/context-segregated-sub-agents). Fly.io notes this is "trivial to implement: just a new context array, another call to the model. Give each call different tools."
- Summarisation as compression. Run the older-half of the conversation through the LLM to summarise it, then replace those messages with the summary. "Feed them back through the LLM to summarise them as a form of on-the-fly compression, whatever you like."
- Tool-surface minimisation. Expose only the tools the current turn needs (patterns/tool-surface-minimization).
- Structured intermediate forms. Choose deliberately between JSON blobs, SQL queries, or markdown summaries as the interchange format between agents; Fly.io lists this as an open design problem: "what the most reliable intermediate forms are (JSON blobs? SQL databases? Markdown summaries) for interchange between them."
Why it's not prompt engineering¶
Prompt engineering is about what you tell the model. Context engineering is about what's in the array when you call. The former is largely taste; the latter is allocation, compression, routing, and caching — all things software engineers have tools and intuitions for. Fly.io's dismissal of the pre-context-engineering era:
"I have never taken seriously the idea that I should tell my LLM 'you are diligent conscientious helper fully content to do nothing but pass butter if that should be what I ask and you would never harvest the iron in my blood for paperclips'. This is very new technology and I think people tell themselves stories about magic spells to explain some of the behavior agents conjure."
The ingested wiki has independent confirmations of the context-as-budget framing from Dropbox Dash (2025-11-17) and Datadog (2026-03-04); both teams converged on the same discipline before Fly.io named it.
Seen in¶
- Fly.io, You Should Write An Agent (2025-11-06) — canonical statement that context engineering is a programming problem, not a magic-spells problem. (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)
- Dropbox Dash, context-engineering post (2025-11-17) — tool schemas + retrieval passages explicitly budgeted against the concepts/agent-context-window; accuracy degraded on longer jobs (concepts/context-rot).
- Datadog MCP server (2026-03-04) — tool-selection accuracy degrades with inventory growth (concepts/tool-selection-accuracy); patterns/tool-surface-minimization is the canonical mitigation.
- Instacart Intent Engine (2025-11-13) — canonical retrieval-relevance instance of context engineering at production scale. Context-engineering here is concrete: an offline RAG pipeline injects three Instacart-specific streams into the teacher LLM's prompt — (a) top converted brand for the query, (b) top converted categories, (c) product-catalog brand names with high semantic similarity ranked by embedding scores — plus a post-generation guardrail that validates tags against the product taxonomy. Instacart names context as "the defensible moat" for LLM applications: "A generic LLM is a commodity; your business context is what makes your application defensible, because domain knowledge is the most valuable asset." Three-lever hierarchy explicitly stated: "Fine-tuning > Context-Engineering (RAG) > Prompting" — each progressively more invasive than the last.
- Instacart LACE chatbot evaluation (2025-06-11) — canonical evaluator-prompt instance of context engineering. Where the Intent Engine applies context engineering to the chatbot's own prompt (retrieval-time context for the user-facing query), LACE applies it to the judge's prompt: Instacart-specific operational knowledge (e.g. "shoppers use company-authorized cards, not the customer's payment method") is embedded via a static template in the evaluator prompt so the judge can correctly score contextual-relevancy criteria that would otherwise miss business-model nuance. Instacart reports >90% accuracy on context-dependent criteria once the template is present. Future-work direction named explicitly: move from static template to dynamic prompt construction + real-time RAG-style retrieval — "ensures the chatbot retrieves relevant knowledge on demand, keeping prompts concise and focused while preserving strong evaluation performance" — which would mirror the Intent Engine's retrieval-engineered teacher prompt at the evaluator layer. Extends the parent concept along a new axis: context engineering for LLM-as-judge, not just for user-facing inference.
- Meta AI Pre-Compute Engine (2026-04-06) — canonical offline-preloading instance of context engineering for proprietary code. Meta operates at the opposite end of the axis from Instacart: not injecting retrieval passages at runtime but pre-computing a 59-file context layer once via a 50+-agent swarm, then loading relevant files opt-in per task. The load-bearing observation: the pretraining-overlap asymmetry (academic research found context files hurt agent success on Django / matplotlib because pretraining already covered them) inverts on proprietary codebases — context files covering tribal knowledge that "exists nowhere in any model's training data" delivers ~40% fewer tool calls per task. Three design decisions Meta names as the differentiator: compass-not-encyclopedia (~1,000 tokens, not encyclopedic), opt-in (loaded only when relevant, not always-on), quality-gated (multi-round critic review + automated self-upgrade). Canonical wiki instance of the offline-context-preloading discipline within context engineering — sibling to Instacart's retrieval-augmented discipline on the same parent concept.
- Meta Capacity Efficiency Platform (2026-04-16) — canonical runtime-composed skills + tools instance. Meta's third 2026 context-engineering bet (after Pre-Compute Engine's offline compass-shape files and before any future retrieval-at-runtime work): a two-layer platform with MCP tools (profiling / experiments / config history / code search / documentation) + skills (markdown-encoded reasoning patterns telling an LLM which tools to use and how to interpret results). Same tools across offense + defense; skills differ per use case. Extends the parent concept with the prescriptive-rather-than-descriptive form of encoded knowledge: compass-shape files describe what a module is, skills describe how to reason about a class of problem. Same model-agnostic markdown bet — "the knowledge layer is model-agnostic" — making the investment survive model upgrades. Canonical wiki instance of context engineering as a platform-leverage mechanism: "each new capability requires few to no new data integrations since they can just compose existing tools with new skills."
- Slack Spear context architecture (2026-04-13) — canonical multi-agent long-run instance. Where Fly.io canonicalised context engineering as a programming problem on a single agent, and Instacart/Meta canonicalised it on retrieval + offline preloading, Slack extends it to multi-agent loops spanning hundreds of inference requests. Slack's architectural move: abolish raw message-history carry-forward entirely (see concepts/no-message-history-carry-forward) and replace it with three curated artifacts (Director's Journal + Critic's Review + Critic's Timeline) produced round-by-round (see patterns/three-channel-context-architecture and concepts/online-context-summarisation). Canonical claim that this is not just a token-budget optimisation: "Even if context windows were infinitely large, passing message history between rounds would not necessarily be desirable: the accumulated context could impede the agents' capacity to respond appropriately to new information." New framing added to the wiki: over-sharing has a cognitive-load cost, not just a token cost — more context is not strictly better in multi-agent systems. Canonical instance of context engineering applied at long-run multi-agent altitude, complementing the single-agent and RAG instances.
- Expedia STAR (2026-04-28) — canonical deliberately-minimal instance. STAR's entire context-engineering story is a reduction: no RAG, no short-/long-term memory, no tool use, no MCP, no conversational history — "there is limited context engineering beyond domain-specific prompts". The whole context envelope is fixed prompts + fixed chain steps + a 4k-per-response token cap that turns the aggregate into a statically computable sum (concepts/token-heavy-system). Expedia names this as a design-discipline choice, not a lack of understanding: "the additional and currently less understood failure modes of an agent" is the load-bearing phrase. The wiki treats STAR as the counterpoint to the Fly.io / Instacart / Meta / Slack axis: when evaluation maturity doesn't yet support absorbing agent failure modes, the right move is to do less context engineering, not more — keep the chain fixed and the context sum finite. Roadmap explicitly lists each exclusion (MCP tool use, dependency-graph context, conversational UI) as future work, graduated onto STAR only when the eval stack catches up. First wiki instance of context engineering viewed from the "when to stay below the agent line" angle.
Related¶
- concepts/context-window-as-token-budget
- concepts/agent-loop-stateless-llm
- concepts/context-rot
- concepts/agent-context-window
- concepts/tool-selection-accuracy
- concepts/query-understanding — retrieval-side application of context engineering
- concepts/semantic-role-labeling — a concrete sub-task where context engineering injects conversion-history + catalog + brand-embedding signals into the LLM prompt
- concepts/tribal-knowledge — the content the offline-preloading discipline extracts
- concepts/compass-not-encyclopedia — the format discipline Meta layers on top of context-engineering
- concepts/context-file-freshness — the staleness discipline that makes offline-preloaded context sustainable
- concepts/config-as-code-pipeline — the workload class with the highest yield for offline preloading
- patterns/context-segregated-sub-agents
- patterns/tool-surface-minimization
- patterns/tool-call-loop-minimal-agent
- patterns/head-cache-plus-tail-finetuned-model / patterns/offline-teacher-online-student-distillation — context-engineering inside an offline teacher pipeline feeding both a production cache and a student's training set
- patterns/precomputed-agent-context-files — the offline-preloading architectural pattern
- patterns/multi-round-critic-quality-gate — the quality-gate for offline context artifacts
- patterns/five-questions-knowledge-extraction — the extraction methodology
- patterns/self-maintaining-context-layer — the freshness loop
- systems/instacart-intent-engine
- systems/lace-instacart — evaluator-prompt application of context engineering (static-template Instacart knowledge in judge prompt, RAG-at-evaluator named as future work)
- systems/expedia-star — counterpoint: deliberately-minimal context engineering, fixed chain, 4k-per-response cap, no RAG / tool use / memory / MCP
- concepts/llm-as-judge — the component class the evaluator-prompt axis of context engineering applies to
- concepts/token-heavy-system — the sizing discipline Expedia STAR canonicalises on the wiki
- concepts/prompt-chaining — the fixed-chain primitive STAR builds on
- patterns/static-prompt-chain-over-agent-loop — the generalised "stay below agent altitude" pattern
- systems/meta-ai-precompute-engine
- companies/instacart
- companies/expedia