PATTERN Cited by 1 source
Context compaction service¶
Intent¶
Run a dedicated service before each LLM call that trims or summarises older tool outputs when the context approaches its token limit, keeping long-running reasoning loops viable without losing earlier reasoning entirely.
Mechanism¶
- Threshold check. Before each model call, measure current context size against the token limit.
- Selective trimming. When approaching the limit, older tool outputs are trimmed or summarised. Recent results are kept at full resolution (recency bias: the model needs recent context most).
- Offload, don't discard. Pruned outputs are stored externally so the model can read them back on demand if it later needs the detail.
Complementary to decomposition¶
Context compaction handles depth (one long chain of reasoning exceeding the window). For width (many parallel research strands), use child-instance decomposition (patterns/context-segregated-sub-agents) — each child gets a clean context for its strand.
Trade-offs¶
| Benefit | Cost |
|---|---|
| Keeps 150-iteration loops within token limits | Summarisation is lossy |
| Preserves reasoning continuity | Adds latency per LLM call (compaction step) |
| On-demand retrieval mitigates loss | Retrieval adds an extra tool call when needed |
Seen in¶
- sources/2026-06-18-atlassian-long-horizon-reasoning-engine — Atlassian Long Horizon runs a Context Compaction Service before each model call; pruned outputs offloaded for on-demand retrieval.