Skip to content

PATTERN Cited by 1 source

Context compaction service

Intent

Run a dedicated service before each LLM call that trims or summarises older tool outputs when the context approaches its token limit, keeping long-running reasoning loops viable without losing earlier reasoning entirely.

Mechanism

  1. Threshold check. Before each model call, measure current context size against the token limit.
  2. Selective trimming. When approaching the limit, older tool outputs are trimmed or summarised. Recent results are kept at full resolution (recency bias: the model needs recent context most).
  3. Offload, don't discard. Pruned outputs are stored externally so the model can read them back on demand if it later needs the detail.

Complementary to decomposition

Context compaction handles depth (one long chain of reasoning exceeding the window). For width (many parallel research strands), use child-instance decomposition (patterns/context-segregated-sub-agents) — each child gets a clean context for its strand.

Trade-offs

Benefit Cost
Keeps 150-iteration loops within token limits Summarisation is lossy
Preserves reasoning continuity Adds latency per LLM call (compaction step)
On-demand retrieval mitigates loss Retrieval adds an extra tool call when needed

Seen in

Last updated · 542 distilled / 1,571 read