Skip to content

CONCEPT Cited by 2 sources

Memory compaction

Definition

Memory compaction (also called context compaction) is the lifecycle moment in an agent loop at which the context window is shortened — because the conversation is about to exceed the model's limit, or because the agent is showing context rot under growing accumulated context.

Every long-running agent harness has a compaction strategy, implicit or explicit. The interesting design question is what happens to the discarded material.

Two strategies

Strategy What happens at compaction
Discard (status quo) Harness truncates the conversation and permanently loses everything pruned — tool outputs, side-channel facts, user preferences stated earlier.
Preserve-to-memory Harness ships the about-to-be-pruned conversation to a memory service, which extracts + classifies + stores facts / events / instructions / tasks for later retrieval.

Cloudflare Agent Memory is the canonical wiki instance of the second strategy:

"The critical moment in an agent's context lifecycle is compaction, when the harness decides to shorten context to stay within a model's limits or to avoid context rot. Today, most agents discard information permanently. Agent Memory preserves knowledge on compaction instead of losing it."

— (Cloudflare, 2026-04-17)

Why compaction is unavoidable

  • Hard model limits — context windows past 1M tokens exist, but are not free: every embedded-in-window token costs inference time + per-token money + attention share.
  • Soft accuracy limitscontext rot means accuracy degrades before the hard limit, well short of the token ceiling.
  • Runaway tool-output growth — a single log-fetch or SQL query can materialise hundreds of KB of noise that dwarfs the original user intent.

Any agent running long enough will hit one of the three.

Bulk-ingest hook

The Cloudflare API shape exposes compaction as an explicit handoff point:

// harness at compaction time
await profile.ingest(
  messagesAboutToBePruned,
  { sessionId }
);

ingest is the bulk path typically called when the harness compacts context. Tool-calls (remember / recall / forget) are the direct-model path for moment-to-moment decisions; ingest is the harness-invoked path for bulk handoff.

Sibling patterns for in-window compression

Compaction-to-memory is distinct from (and complementary to) in-window compression strategies:

  • Tree-structured conversation memory (Project Think Persistent Sessions) — non-destructive compaction: older branches stay in SQLite + FTS, compacted summary in window, agent can search_context to retrieve specifics on demand.
  • Summarisation-in-place — naive: replace N old messages with a summary in window. Lossy, permanent.
  • Write tool results to disk (Dropbox / Cursor / Claude Code shift; see sources/2026-01-28-dropbox-knowledge-graphs-mcp-dspy-dash) — reference by handle from window, fetch-on-demand if the model needs the specifics.
  • Memory compaction — extract + classify + durably store in a retrieval substrate separate from the conversation log; retrieved on future turns via explicit recall.

Seen in

Last updated · 200 distilled / 1,178 read