Skip to content

PATTERN Cited by 1 source

Constrained memory API

Problem

An agent given raw database or filesystem access as its memory substrate will burn tokens designing queries and choosing storage strategies rather than doing the actual task. The primary agent's context window fills up with schema reasoning, query retries, and storage-layout choices — exactly the failure the memory tier was supposed to prevent.

At the same time, a memory substrate that exposes no model-driven operations can't capture the "this is important, remember it" signal that arrives mid-turn, outside the harness's bulk-compaction hook.

The pattern

Expose a deliberately narrow, six-operation API — no broader — with two entry shapes:

  1. Bulk harness pathingest(messages, {sessionId}) called at compaction time, not by the model.
  2. Four narrow model toolsremember(content, {sessionId}), recall(query), forget(memoryId), list() — each one does one thing, takes one natural-language argument, returns one natural-language answer.
  3. One profile-scoping primitivegetProfile(name) returns the isolated memory store for a caller-defined scope.

The properties that matter:

  • No raw query language. recall(query) accepts natural language, runs the full retrieval pipeline internally (analyser → parallel channels → fusion → synthesis), and returns a synthesised natural-language answer. The model never writes SQL, never does full-text-search operator tuning, never designs embeddings.
  • No schema exposure. The model can't see how memories are stored, what indexes exist, what the fact-key normalisation rule is. Opacity is a feature — the storage schema can evolve without breaking agent code.
  • No storage-strategy knobs. Retrieval weights, RRF tuning, vector-index choice, embedder selection are all service-owned. The model doesn't choose because it can't — the API doesn't give it the surface to do so.
  • Harness-vs-model split. ingest is bulk, invoked by the harness at the explicit compaction hook. remember / recall / forget / list are moment-to-moment tools the model uses inline. The two paths do not overlap; each is tuned to its invoker.

Canonical wiki instance: Cloudflare Agent Memory

Agent Memory exposes exactly this shape:

const profile = await env.MEMORY.getProfile("my-project");

// Harness bulk path at compaction
await profile.ingest(messages, { sessionId });

// Model tools (narrow)
await profile.remember({ content, sessionId });
const answer = await profile.recall(query);
await profile.forget(memoryId);
await profile.list();

Cloudflare states the posture explicitly:

"Tighter ingestion and retrieval pipelines are superior to giving agents raw filesystem access. In addition to improved cost and performance, they provide a better foundation for complex reasoning tasks required in production, like temporal logic, supersession, and instruction following."

"The primary agent should never burn context on storage strategy. The tool surface it sees is deliberately constrained so that memory stays out of the way of the actual task."

— (Cloudflare, 2026-04-17)

Why it beats the alternatives

Approach Agent burden Quality ceiling
Raw filesystem / DB / vector store High — designs queries, picks indexes, tunes retrieval Bounded by model's DB-query skill
Natural-language memory tool + narrow API Near-zero — one tool call with one English argument Bounded by service's retrieval pipeline, iteratable over time
No model-side memory tool, only harness compaction Zero model burden but no mid-turn capture Misses "remember this" moments

The six-operation shape is the sweet spot: harness handles bulk ingest without model involvement; model gets narrow tools for the mid-turn hooks the harness can't anticipate.

  • Flexibility loss. An agent with edge-case needs (cross-profile join, raw SQL, custom embedder) is blocked — the API is the ceiling. Canonical mitigation: "We'll likely expose data for programmatic querying down the road, but we expect that to be useful for edge cases, not common cases."
  • Service-side evolution is the whole game. Because all retrieval logic is inside the service, improvements in extraction / classification / retrieval / synthesis benefit every caller without client changes — but also, stagnation on the service side is a ceiling on every caller.

Seen in

Last updated · 200 distilled / 1,178 read