PATTERN Cited by 1 source

Context compaction service¶

Intent¶

Run a dedicated service before each LLM call that trims or summarises older tool outputs when the context approaches its token limit, keeping long-running reasoning loops viable without losing earlier reasoning entirely.

Mechanism¶

Threshold check. Before each model call, measure current context size against the token limit.
Selective trimming. When approaching the limit, older tool outputs are trimmed or summarised. Recent results are kept at full resolution (recency bias: the model needs recent context most).
Offload, don't discard. Pruned outputs are stored externally so the model can read them back on demand if it later needs the detail.

Complementary to decomposition¶

Context compaction handles depth (one long chain of reasoning exceeding the window). For width (many parallel research strands), use child-instance decomposition (patterns/context-segregated-sub-agents) — each child gets a clean context for its strand.

Trade-offs¶

Benefit	Cost
Keeps 150-iteration loops within token limits	Summarisation is lossy
Preserves reasoning continuity	Adds latency per LLM call (compaction step)
On-demand retrieval mitigates loss	Retrieval adds an extra tool call when needed

Seen in¶

sources/2026-06-18-atlassian-long-horizon-reasoning-engine — Atlassian Long Horizon runs a Context Compaction Service before each model call; pruned outputs offloaded for on-demand retrieval.