Skip to content

CONCEPT Cited by 4 sources

One-to-one agent instance

Definition

The one-to-one agent instance is the structural observation that an AI agent, unlike a traditional request handler or web service, cannot sensibly multi-tenant: each session is its own task, with its own memory, tool history, credentials, and intermediate artifacts. "Agents are one-to-one." The deployment unit is one agent instance per (user, task, conversation), not one shared process serving many users.

"A restaurant has a menu and a kitchen optimized to churn out dishes at volume. An agent is more like a personal chef: different ingredients, different techniques, different tools every time." (Source: Cloudflare Project Think.)

Why it breaks traditional scaling math

Conventional web applications serve N users per instance where N is hundreds to millions. Horizontal scaling adds instances as peak concurrency grows but total instance count stays far below user count.

Agents invert the ratio: one instance per user, often per open task. At 100 M knowledge workers × modest concurrency that's tens of millions of simultaneous sessions — "at current per- container costs, that's unsustainable. We need a different foundation."

The post's worked example:

"10,000 agents, each active 1% of the time: VMs / Containers — 10,000 always-on instances. Durable Objects — ~100 active at any moment."

The active count stays small; the addressable count scales with users.

Required substrate properties

For one-to-one agents to be economic, the runtime must give:

  1. Per-agent isolation — memory, state, credentials don't leak across sessions. Agent-to-agent mischief is structurally impossible.
  2. Zero idle cost — hibernated agents don't bill. "$0 until the agent wakes up." Requires runtime-managed hibernation + wake-on-event semantics rather than keep-the-process-alive.
  3. Automatic scaling per-agent — no provisioning, no capacity pre-allocation, no sticky-session logic. Spawning agent N+1 is a function call, not a capacity plan.
  4. Built-in state — each agent has its own addressable identity + persistent store reachable without an external DB. Otherwise the 10,000-agents-per-user shape implodes into external-connection fan-out.
  5. Platform-managed recovery — when the host crashes, the platform restarts the agent with state intact. The application doesn't run its own process manager.
  6. Identity / routingname → agent is the API; no load balancers, no session affinity, no routing protocol.

The concepts/actor-model is the canonical fit. Project Think's substrate — Durable Objects — delivers all six properties:

Property Durable Objects
Idle cost Zero (hibernated)
Scaling Automatic, per-agent
State Built-in SQLite
Recovery Platform-restart, state survives
Identity / routing name → agent, built-in
Concurrency Single-writer per key, serialised

Distinguishing from serverless functions

Serverless functions (AWS Lambda, Workers) also scale to zero and start per-request. But a Lambda function has no durable identity — request N+1 may hit a different process with no memory of request N. An actor-based agent runtime routes every message addressed to "agent:alice-session-1234" to the same instance, wherever the platform put it, with full state intact.

A one-to-one agent is Lambda + addressable identity + embedded state — that composition is an actor, not a serverless function.

Contrast with multi-user AI APIs

An LLM-inference endpoint (POST /v1/chat/completions) is a classical high-concurrency web service — stateless, many users per instance. The agent wrapping that endpoint is the one-to-one unit; the inference call inside is a shared backend. The agent holds memory, workspace, conversation tree; the inference endpoint is stateless throughput.

Cost implications

  • Serving cost scales with active concurrency, not user count — assuming true hibernation with per-wakeup billing.
  • Storage cost scales with persistent state per agent — memory blocks, session trees, workspace files. SQLite per-agent means per-agent cost, not fleet cost.
  • Developer cost: writing for the one-to-one shape requires getting comfortable with "talking to a specific agent instance" as a first-class primitive. Teams coming from stateless-RPC- thinking often under-use the addressability and over-use a shared DB.

Implications for architecture

  • Credential scope — one agent holds the credentials for its one user. Cross-user confusion is impossible by construction.
  • Personalization — persistent memory is the default, not the bolted-on feature; the agent's SQLite is the "user profile".
  • Observability — per-agent traces replace per-request sampling. An agent has a history, not an event stream.
  • Debugging — you open the agent, not the log. Deterministic single-writer execution reduces heisen-bugs.
  • Failure isolation — one agent's crash doesn't take down others, unlike shared-process architectures.

Seen in

Last updated · 200 distilled / 1,178 read