Skip to content

CONCEPT Cited by 4 sources

Agent memory

Definition

Agent memory is an AI agent's accumulated, searchable context across turns and sessions — the things this agent (or this agent for this user) has seen, decided, or concluded — stored in a form the agent can retrieve on demand, rather than held verbatim in the context window.

The canonical shape in production today: memory is an index. Past decisions, resolutions, notes, user preferences, tool outputs are written as searchable items in a dedicated retrieval primitive, then pulled back into context on future turns when relevant.

Two complementary sub-shapes

Modern agent stacks split memory into two substrates:

  1. Session-scoped / episodic — the in-progress conversation, per-turn messages, tool-call chains. Structurally a tree or log with fast full-text search; not expected to persist cross-session indefinitely. Canonical wiki instance: Project Think's Persistent Sessions (patterns/tree-structured-conversation-memory) — SQLite parent_id tree + FTS5 full-text index, forking, non-destructive compaction, search_context tool exposed to the model.

  2. Tenant-scoped / semantic — durable accumulated knowledge about this user / customer / project / codebase. Structurally an indexed document store with hybrid retrieval; expected to accumulate forever. Canonical wiki instance: AI Search's per-customer instance — each customer gets their own per-tenant search instance, the agent calls save_resolution(filename, content)items.uploadAndPoll(...) after resolving an issue, future sessions query it via search_knowledge_base.

Both realise the same thesis: the context window is a scarce resource; memory lives on disk, not in prompt.

Named in Cloudflare AI Search (2026-04-16)

The support-agent worked example in the 2026-04-16 AI Search post is the canonical published realisation of memory-as-search-instance:

"When a customer comes back with a new issue, knowing what's already been tried saves everyone time. You can track this by creating an AI Search instance per customer. After each resolved issue, the agent saves a summary of what went wrong and how it was fixed. Over time, this builds up a searchable log of past resolutions. You can create instances dynamically using the namespace binding."

"save_resolution: after resolving an issue, the agent saves a summary so future agents have full context"

— (Cloudflare, 2026-04-16)

The shape:

tool: save_resolution(filename, content)
  → instance = env.SUPPORT_KB.get(`customer-${customerId}`)
  → instance.items.uploadAndPoll(filename, content)   # stores + indexes atomically
  → "saved: true"

# future session:
tool: search_knowledge_base(query)
  → env.SUPPORT_KB.search({
       query,
       ai_search_options: {
         boost_by: [{ field: "timestamp", direction: "desc" }],
         instance_ids: ["product-knowledge", `customer-${customerId}`]
       }
     })

Same hybrid-retrieval primitive used for shared docs (product-knowledge) and per-customer memory (customer-<id>), merged in one call via patterns/cross-index-unified-retrieval.

Why search-as-memory, not KV-as-memory

  • LLM-friendly retrieval: natural-language recall ("have we tried this fix before?") maps cleanly onto hybrid-retrieval semantics; KV requires pre-determined keys.
  • Relevance at retrieval time, not write time: the agent writes "what happened"; the retrieval layer decides what's relevant later, per query.
  • Recency boost via metadata boost: timestamp desc surfaces recent resolutions first — for free.
  • Unified primitive: shared knowledge (product docs) + episodic memory (past resolutions) both live in the same retrieval surface, merged by cross-instance search. The LLM has one search_knowledge_base tool, not two. See patterns/unified-retrieval-tool.
  • Unified storage and index: uploadAndPoll is one call — no sync pipeline to operationalise per-customer.

Structural requirements on the substrate

The 2026-04-16 launch post makes the requirements explicit by realising them:

  1. Runtime provisioningper-tenant instance created on first appearance.
  2. Atomic write + indexpatterns/upload-then-poll-indexing.
  3. Low-cost instances — the platform's cost model must support thousands of small per-customer indexes.
  4. Composable queries — cross-instance search so the agent can query many memories at once.
  5. Metadata-driven ranking — recency boost as table stakes.
  6. Hybrid retrievalBM25 + vector because memory is a mix of structured signals (error codes, product names, customer IDs) and semantic content.

Seen in

Last updated · 200 distilled / 1,178 read