PATTERN Cited by 1 source

Credentialed proxy sandbox¶

Pattern¶

When an agentic system needs to execute model-generated code that can act on a user's account, do not put the credentials inside the sandbox. Instead:

Run the generated code in a sandbox (isolate / container / micro-VM) that has no API keys.
Route every outbound call through a credentialed proxy under your control.
The proxy inspects the call (method + body) and classifies it read vs write.
Reads are proxied directly.
Writes are blocked until an elicitation gate is satisfied (explicit user approval, second-party review, policy check).
Only when authorised does the proxy inject the credential server-side and forward the call.

The sandbox never sees the credential. The model never sees it. The generated code cannot contain it — it was never present.

Load-bearing framing¶

Cloudflare's Agent Lee post states the distinction explicitly (Source: sources/2026-04-15-cloudflare-introducing-agent-lee):

"The security boundary isn't just a sandbox that gets thrown away; it's a permission architecture that structurally prevents writes from happening without your approval."

A sandbox is a containment primitive — if the code escapes, the system is compromised. A credentialed proxy is a capability primitive — the code has no way to act on the outside world without the proxy's consent, escape or not. The two compose (defence in depth) but the capability boundary is the structural invariant.

Agent Lee instance¶

Runtime: upstream Cloudflare MCP server with sandboxed execution.
Proxy: a Durable Object per agent session.
Inspection: DO classifies generated code as read or write by method + body.
Enforcement: writes blocked until elicitation-gate approval in the dashboard UI.
Credential: upstream API key held inside the DO and injected server-side at forward time.

Why this is stronger than "the model is reliable"¶

Evals, hallucination scorers, and tool-call-success tracking reduce the probability of an unintended write but cannot drive it to zero. The credentialed-proxy boundary makes un-approved writes structurally impossible: there is no credential path through the sandbox, so the model cannot perform an un-approved write even under adversarial prompt injection or jailbreak.

Prerequisites¶

Read/write classification must be deterministic from the call shape alone (HTTP method + path + body). Breaks down for protocols where side-effects are hidden in semantics the proxy can't see.
A typed API the model can generate code against without needing credentials in scope (goes hand-in-glove with patterns/code-generation-over-tool-calls).
A runtime for the proxy that cannot be addressed by the sandbox directly — must be a separate component with its own identity and secret store. Durable Objects, proxy Workers, and separate services all qualify.

When it applies beyond Agent Lee¶

Any agent shape where the model generates or selects actions that can change external state:

Customer-support agents with refund / credit / refund-like mutations.
Infra agents with cloud-provider APIs (deploy / destroy / modify).
Finance agents with payment primitives.
CI/CD agents with merge / release capabilities.

The gate at step 5 can be a human approval, a second-agent review (patterns/specialized-agent-decomposition), or a policy check (patterns/policy-gate-on-provisioning).

Contrast¶

Sandbox-only: generated code runs in an ephemeral container with the credentials inside. One escape = full account compromise. Common in early MCP demos; unsafe for production on writable APIs.
Read-only token: credential-scope reduction, but can't cover any case that needs a write — only defers the problem.
Elicitation-without-proxy: prompt-the-user-before-a-write implemented in the agent loop itself. Not a structural boundary — a jailbroken model can skip the prompt.

Canonical wiki instance¶

systems/agent-lee — production deployment, DO as proxy, dashboard elicitation gate, ~250,000 tool calls / day.

concepts/elicitation-gate — the approval component inside the proxy's write path.
patterns/code-generation-over-tool-calls — the typical companion pattern on the sandbox side.
systems/cloudflare-durable-objects — the runtime substrate hosting the proxy in Cloudflare's deployment.
patterns/envelope-and-verify — structurally similar pattern at the crypto layer (sign-outside, verify-at-boundary).
concepts/defense-in-depth — sandbox and proxy compose; the proxy is the load-bearing boundary.