CONCEPT Cited by 4 sources

Agent memory¶

Definition¶

Agent memory is an AI agent's accumulated, searchable context across turns and sessions — the things this agent (or this agent for this user) has seen, decided, or concluded — stored in a form the agent can retrieve on demand, rather than held verbatim in the context window.

The canonical shape in production today: memory is an index. Past decisions, resolutions, notes, user preferences, tool outputs are written as searchable items in a dedicated retrieval primitive, then pulled back into context on future turns when relevant.

Two complementary sub-shapes¶

Modern agent stacks split memory into two substrates:

Session-scoped / episodic — the in-progress conversation, per-turn messages, tool-call chains. Structurally a tree or log with fast full-text search; not expected to persist cross-session indefinitely. Canonical wiki instance: Project Think's Persistent Sessions (patterns/tree-structured-conversation-memory) — SQLite parent_id tree + FTS5 full-text index, forking, non-destructive compaction, search_context tool exposed to the model.
Tenant-scoped / semantic — durable accumulated knowledge about this user / customer / project / codebase. Structurally an indexed document store with hybrid retrieval; expected to accumulate forever. Canonical wiki instance: AI Search's per-customer instance — each customer gets their own per-tenant search instance, the agent calls save_resolution(filename, content) → items.uploadAndPoll(...) after resolving an issue, future sessions query it via search_knowledge_base.

Both realise the same thesis: the context window is a scarce resource; memory lives on disk, not in prompt.

Named in Cloudflare AI Search (2026-04-16)¶

The support-agent worked example in the 2026-04-16 AI Search post is the canonical published realisation of memory-as-search-instance:

"When a customer comes back with a new issue, knowing what's already been tried saves everyone time. You can track this by creating an AI Search instance per customer. After each resolved issue, the agent saves a summary of what went wrong and how it was fixed. Over time, this builds up a searchable log of past resolutions. You can create instances dynamically using the namespace binding."

"save_resolution: after resolving an issue, the agent saves a summary so future agents have full context"

— (Cloudflare, 2026-04-16)

The shape:

tool: save_resolution(filename, content)
  → instance = env.SUPPORT_KB.get(`customer-${customerId}`)
  → instance.items.uploadAndPoll(filename, content)   # stores + indexes atomically
  → "saved: true"

# future session:
tool: search_knowledge_base(query)
  → env.SUPPORT_KB.search({
       query,
       ai_search_options: {
         boost_by: [{ field: "timestamp", direction: "desc" }],
         instance_ids: ["product-knowledge", `customer-${customerId}`]
       }
     })

Same hybrid-retrieval primitive used for shared docs (product-knowledge) and per-customer memory (customer-<id>), merged in one call via patterns/cross-index-unified-retrieval.

Why search-as-memory, not KV-as-memory¶

LLM-friendly retrieval: natural-language recall ("have we tried this fix before?") maps cleanly onto hybrid-retrieval semantics; KV requires pre-determined keys.
Relevance at retrieval time, not write time: the agent writes "what happened"; the retrieval layer decides what's relevant later, per query.
Recency boost via metadata boost: timestamp desc surfaces recent resolutions first — for free.
Unified primitive: shared knowledge (product docs) + episodic memory (past resolutions) both live in the same retrieval surface, merged by cross-instance search. The LLM has one search_knowledge_base tool, not two. See patterns/unified-retrieval-tool.
Unified storage and index: uploadAndPoll is one call — no sync pipeline to operationalise per-customer.

Structural requirements on the substrate¶

The 2026-04-16 launch post makes the requirements explicit by realising them:

Runtime provisioning — per-tenant instance created on first appearance.
Atomic write + index — patterns/upload-then-poll-indexing.
Low-cost instances — the platform's cost model must support thousands of small per-customer indexes.
Composable queries — cross-instance search so the agent can query many memories at once.
Metadata-driven ranking — recency boost as table stakes.
Hybrid retrieval — BM25 + vector because memory is a mix of structured signals (error codes, product names, customer IDs) and semantic content.

Seen in¶

sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents — per-customer AI Search instance as memory substrate; save_resolution + search_knowledge_base tools; cross-instance search unifying shared docs + per-customer memory.
sources/2026-04-15-cloudflare-project-think-building-the-next-generation-of-ai-agents — Persistent Sessions primitive for episodic memory; search_context agent-exposed tool over session history.
sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git — filesystem + session-history tier of memory as a per-session Artifacts Git repo. Cloudflare's internal agents persist filesystem state + prompt/session history into a per-session repo so they can time-travel + fork + diff sessions. Third substrate of agent memory in the 2026-04 Cloudflare stack (alongside DO SQLite conversation trees for episodic memory and AI Search instances for semantic memory) — see concepts/repo-per-agent-session.
sources/2026-04-16-cloudflare-email-service-public-beta-ready-for-agents — email thread + DO-embedded state as memory substrate. "Because agents are backed by Durable Objects, calling this.setState() means your agent remembers conversation history, contact information, and context across sessions. The inbox becomes the agent's memory, without needing a separate database or vector store". Fourth substrate in the 2026-04 Cloudflare memory stack — the receive-side and reply-side of an email-native agent both flow through the same per-address DO, so the thread itself (+ the instance's persisted state) is the memory primitive, no additional store needed. Pairs with concepts/email-as-agent-interface and patterns/inbound-classify-persist-reply-pipeline.

concepts/per-tenant-search-instance — the structural unit of tenant-scoped memory.
concepts/agent-context-window — the scarce resource memory exists to work around.
concepts/context-engineering — the broader discipline; memory retrieval is one lever.
concepts/one-to-one-agent-instance — the actor-tier sibling; memory-per-agent makes sense because agents are one-to-one.
concepts/unified-storage-and-index — property required for ergonomic memory writes.
systems/cloudflare-ai-search — canonical tenant-scoped-memory substrate.
systems/project-think — canonical episodic-memory substrate (Persistent Sessions).
patterns/tree-structured-conversation-memory — Project Think's episodic shape.
patterns/runtime-provisioned-per-tenant-search-index — AI Search's tenant-scoped shape.
patterns/unified-retrieval-tool — one tool over both substrates.
systems/cloudflare-artifacts — canonical filesystem + session-history memory substrate.
concepts/repo-per-agent-session — Git-repo-as-memory framing.