PATTERN Cited by 1 source
Constrained memory API¶
Problem¶
An agent given raw database or filesystem access as its memory substrate will burn tokens designing queries and choosing storage strategies rather than doing the actual task. The primary agent's context window fills up with schema reasoning, query retries, and storage-layout choices — exactly the failure the memory tier was supposed to prevent.
At the same time, a memory substrate that exposes no model-driven operations can't capture the "this is important, remember it" signal that arrives mid-turn, outside the harness's bulk-compaction hook.
The pattern¶
Expose a deliberately narrow, six-operation API — no broader — with two entry shapes:
- Bulk harness path —
ingest(messages, {sessionId})called at compaction time, not by the model. - Four narrow model tools —
remember(content, {sessionId}),recall(query),forget(memoryId),list()— each one does one thing, takes one natural-language argument, returns one natural-language answer. - One profile-scoping primitive —
getProfile(name)returns the isolated memory store for a caller-defined scope.
The properties that matter:
- No raw query language.
recall(query)accepts natural language, runs the full retrieval pipeline internally (analyser → parallel channels → fusion → synthesis), and returns a synthesised natural-language answer. The model never writes SQL, never does full-text-search operator tuning, never designs embeddings. - No schema exposure. The model can't see how memories are stored, what indexes exist, what the fact-key normalisation rule is. Opacity is a feature — the storage schema can evolve without breaking agent code.
- No storage-strategy knobs. Retrieval weights, RRF tuning, vector-index choice, embedder selection are all service-owned. The model doesn't choose because it can't — the API doesn't give it the surface to do so.
- Harness-vs-model split.
ingestis bulk, invoked by the harness at the explicit compaction hook.remember/recall/forget/listare moment-to-moment tools the model uses inline. The two paths do not overlap; each is tuned to its invoker.
Canonical wiki instance: Cloudflare Agent Memory¶
Agent Memory exposes exactly this shape:
const profile = await env.MEMORY.getProfile("my-project");
// Harness bulk path at compaction
await profile.ingest(messages, { sessionId });
// Model tools (narrow)
await profile.remember({ content, sessionId });
const answer = await profile.recall(query);
await profile.forget(memoryId);
await profile.list();
Cloudflare states the posture explicitly:
"Tighter ingestion and retrieval pipelines are superior to giving agents raw filesystem access. In addition to improved cost and performance, they provide a better foundation for complex reasoning tasks required in production, like temporal logic, supersession, and instruction following."
"The primary agent should never burn context on storage strategy. The tool surface it sees is deliberately constrained so that memory stays out of the way of the actual task."
Why it beats the alternatives¶
| Approach | Agent burden | Quality ceiling |
|---|---|---|
| Raw filesystem / DB / vector store | High — designs queries, picks indexes, tunes retrieval | Bounded by model's DB-query skill |
| Natural-language memory tool + narrow API | Near-zero — one tool call with one English argument | Bounded by service's retrieval pipeline, iteratable over time |
| No model-side memory tool, only harness compaction | Zero model burden but no mid-turn capture | Misses "remember this" moments |
The six-operation shape is the sweet spot: harness handles bulk ingest without model involvement; model gets narrow tools for the mid-turn hooks the harness can't anticipate.
Related tradeoffs¶
- Flexibility loss. An agent with edge-case needs (cross-profile join, raw SQL, custom embedder) is blocked — the API is the ceiling. Canonical mitigation: "We'll likely expose data for programmatic querying down the road, but we expect that to be useful for edge cases, not common cases."
- Service-side evolution is the whole game. Because all retrieval logic is inside the service, improvements in extraction / classification / retrieval / synthesis benefit every caller without client changes — but also, stagnation on the service side is a ceiling on every caller.
Seen in¶
- sources/2026-04-17-cloudflare-agents-that-remember-introducing-agent-memory — canonical wiki instance; six-operation API surface; explicit "deliberately constrained" design posture.
Related¶
- patterns/tool-surface-minimization — generalisation of this pattern to any model-facing tool surface.
- patterns/multi-stage-extraction-pipeline — the extraction half
that
ingestdelegates to. - patterns/parallel-retrieval-fusion — the retrieval half that
recalldelegates to. - patterns/agent-first-storage-primitive — the broader storage-tier design posture.
- concepts/agent-memory — the substrate the API exposes.
- concepts/memory-compaction — the lifecycle moment
ingestfires. - concepts/agent-context-window — the scarce resource the API protects.
- concepts/context-rot — the forcing function.
- systems/cloudflare-agent-memory — canonical realisation.