Skip to content

CLOUDFLARE 2026-04-15 Tier 1

Read original ↗

Project Think: building the next generation of AI agents on Cloudflare

Summary

Cloudflare announced Project Think (2026-04-15, published alongside the same-day Agent Lee launch) — "the next generation of the Agents SDK" — a set of new primitives for building long-running AI agents plus an opinionated base class (Think) that wires them together. The explicit framing problem: coding agents today "only run on your laptop or an expensive VPS… are expensive when idle… require management and manual setup," and — structurally — "agents are one-to-one": each agent is a unique instance serving one user on one task, not many users on one instance. At 100 M knowledge workers with modest concurrency that's tens of millions of simultaneous sessions — "at current per-container costs, that's unsustainable." Six primitives: (1) durable execution via fibers (runFiber() registers a function in SQLite, stash() checkpoints, onFiberRecovered resumes after eviction / restart / deploy); (2) sub-agents via Facets — child Durable Objects colocated with the parent, each with its own isolated SQLite and typed RPC ("sub-agent RPC latency is a function call"); (3) persistent sessions — tree-structured messages with parent_id, forking, non-destructive compaction (summarise old messages, don't delete), FTS5 full-text search across history; (4) sandboxed code execution via Dynamic Workers + @cloudflare/codemode + runtime npm resolution via @cloudflare/worker-bundler; (5) an execution ladder — Tier 0 durable Workspace filesystem (SQLite + R2), Tier 1 Dynamic Worker, Tier 2 + npm, Tier 3 + headless browser, Tier 4 + full Cloudflare Sandbox with git/npm test/ cargo build; (6) self-authored extensions — the agent writes its own TypeScript tool in a Dynamic Worker, declares permissions ({network: ["api.github.com"], workspace: "read-write"}), and the ExtensionManager loads it; extensions persist in DO storage and survive hibernation. The Think base class ties these together with overridable hooks (getModel, getSystemPrompt, getTools, configureSession, maxSteps, beforeTurn, beforeToolCall, afterToolCall, onStepFinish, onChatResponse) plus context blocks — structured system-prompt sections (soul / memory) the model can read + update, with token accounting ("MEMORY… 42%, 462/1100 tokens"). Three waves framing: chatbots → coding agents → agents as infrastructure (durable, distributed, structurally safe, serverless). Cloudflare is using Think internally to build its own background-agent infrastructure; ships in preview as @cloudflare/think. No production-scale numbers in the post (unlike the same-day Agent Lee launch) — this is a platform-primitives article, not a retrospective.

Key takeaways

  1. "Agents are one-to-one" is the load-bearing scaling premise. Traditional apps serve many users from one instance; agents don't. "A restaurant has a menu and a kitchen optimized to churn out dishes at volume. An agent is more like a personal chef: different ingredients, different techniques, different tools every time." At 100 M knowledge workers × modest concurrency that's tens of millions of simultaneous sessions — unsustainable at container-economics. The Durable Object substrate exists because this scaling math requires it: per-agent isolation, zero idle cost (hibernation), platform-managed identity / routing / state / recovery. (See concepts/one-to-one-agent-instance.) Comparison table in the post: 10,000 agents active 1% of the time = 10,000 always-on VMs vs ~100 active DO instances at any moment.

  2. Durable execution with fibers solves the "30-second LLM call → 30-minute loop → platform restart" problem. "A fiber is a durable function invocation: registered in SQLite before execution begins, checkpointable at any point via stash(), and recoverable on restart via onFiberRecovered." The SDK keeps the agent alive automatically during fiber execution; keepAlive() / keepAliveWhile() prevents eviction during minute-scale active work; for hour-to-day operations (CI pipelines, design reviews, video generation) the agent starts the work, persists the job ID, hibernates, and wakes on callback. This is concepts/durable-execution as a library primitive, not a workflow-engine abstraction — see patterns/checkpoint-resumable-fiber. (Source: sources/2026-04-15-cloudflare-project-think-building-the-next-generation-of-ai-agents.)

  3. Sub-agents are co-located child DOs with typed RPC at function-call latency. "Sub-agents are child Durable Objects colocated with the parent via Facets, each with their own isolated SQLite and execution context… sub-agent RPC latency is a function call." Storage isolation is enforced by the runtime: "there's no implicit sharing of data between them." TypeScript catches misuse at compile time. Worked example: Orchestrator spawns ResearchAgent + ReviewAgent in parallel via subAgent(Class, name), awaits both, synthesizes — pattern: patterns/colocated-child-actor-rpc. This is the DO-as-actor substrate re-exposed as a concurrency primitive inside an agent session.

  4. Sessions are trees of messages, not flat lists. "Conversations are stored as trees, where each message has a parent_id. This enables forking (explore an alternative without losing the original path), non-destructive compaction (summarize older messages rather than deleting them), and full-text search across conversation history via FTS5." SessionManager.create(this), session.getHistory(), this.sessions.fork(id, messageId, "alternative-approach"). Compaction summarises + keeps full history in SQLite; search is server-side via FTS5 (the SQLite module). Agent can search_context over its own past. Pattern: patterns/tree-structured-conversation-memory.

  5. Code Mode framing reinforced with a 99.9% token number. "The Cloudflare API MCP server demonstrates this at scale. We expose only two tools (search() and execute()), which consume ~1,000 tokens, vs. ~1.17 million tokens for the naive tool-per-endpoint equivalent. This is a 99.9% reduction." Same number family as the "<1,000 tokens for ~3,000 operations" in the 2026-04-13 CLI post and the Agent Lee post, now quantified against the naive baseline. See patterns/code-generation-over-tool-calls, systems/code-mode.

  6. The execution ladder is additive — the agent is useful at Tier 0 alone, each tier adds. "The key design principle: the agent should be useful at Tier 0 alone, where each tier is additive."

  7. Tier 0 — Workspace: durable virtual filesystem backed by DO SQLite + R2. Read, write, edit, search, grep, diff. Powered by @cloudflare/shell.
  8. Tier 1 — Dynamic Worker: LLM-generated JavaScript in a sandboxed V8 isolate with no network access. Powered by @cloudflare/codemode.
  9. Tier 2 — + npm: @cloudflare/worker-bundler fetches from the registry, bundles with esbuild, loads into the Dynamic Worker. Agent writes import { z } from "zod" and it works.
  10. Tier 3 — + browser: headless browser via Browser Rendering. Useful when the target service has no MCP / API.
  11. Tier 4 — + full sandbox: Cloudflare Sandbox with toolchains / repos / deps; git clone, npm test, cargo build; bidirectionally synced with the Workspace.

See concepts/execution-ladder, patterns/additive-capability-ladder.

  1. Dynamic Workers + capability model: "no ambient authority". "Instead of starting with a general-purpose machine and trying to constrain it, Dynamic Workers begin with almost no ambient authority (globalOutbound: null, no network access) and the developer grants capabilities explicitly, resource by resource, through bindings." Dynamic Workers spin up in milliseconds as fresh V8 isolates with a few MB of memory — "roughly 100× faster and up to 100× more memory-efficient than a container." The question "how do we stop this thing from doing too much?" becomes "what exactly do we want this thing to be able to do?" — see concepts/capability-based-sandbox. Canonical structural answer distinct from the credentialed proxy pattern Agent Lee uses; the two compose.

  2. Self-authored extensions: the agent writes its own tools at runtime. Extension manifest shape:

    {
      "name": "github",
      "description": "GitHub integration: PRs, issues, repos",
      "tools": ["create_pr", "list_issues", "review_pr"],
      "permissions": {
        "network": ["api.github.com"],
        "workspace": "read-write"
      }
    }
    
    The agent writes TypeScript into a Dynamic Worker + declares permissions; ExtensionManager bundles (optionally with npm via @cloudflare/worker-bundler), loads into a Dynamic Worker, registers the tools. "The extension persists in DO storage and survives hibernation. The next time the user asks about pull requests, the agent has a github_create_pr tool that didn't exist 30 seconds ago." Claim: "this is the kind of self-improvement loop that makes agents genuinely more useful over time. Not through fine-tuning or RLHF, but through code." See concepts/self-authored-extension.

  3. Think is an opinionated harness — wire everything together or override piecemeal. Minimal subclass:

    export class MyAgent extends Think<Env> {
      getModel() {
        return createWorkersAI({ binding: this.env.AI })(
          "@cf/moonshotai/kimi-k2.5"
        );
      }
    }
    
    That alone gives working chat + streaming + persistence + abort/cancel + error handling + resumable streams + built-in workspace filesystem. Deploy with npx wrangler deploy. Opt-in overrides: getSystemPrompt, getTools, maxSteps, configureSession. Per-turn agentic loop: "beforeTurn() → streamText() → beforeToolCall() → afterToolCall() → onStepFinish() → onChatResponse()." Hooks let you switch to a cheaper model on follow-ups, limit tools per turn, pass client-side context per turn, log to analytics, auto-trigger a follow-up turn — "all without replacing onChatMessage."

  4. Persistent memory via context blocks — the model can update its own system prompt, with token accounting. "These are structured sections of the system prompt that the model can read and update over time, and they persist across hibernation. The model sees 'MEMORY (Important facts, use set_context to update) [42%, 462/1100 tokens]' and can proactively remember things."

    configureSession(session: Session) {
      return session
        .withContext("soul", {
          provider: { get: async () => "You are a helpful coding assistant." }
        })
        .withContext("memory", {
          description: "Important facts learned during conversation.",
          maxTokens: 2000
        })
        .withCachedPrompt();
    }
    
    Non-destructive compaction summarises older messages (full history stays in SQLite). FTS5 search inside a session or across all sessions. Agent uses search_context tool to query its own past.

  5. Three waves framing — "agents as infrastructure" is the bet. "The first wave was chatbots." Stateless, reactive, fragile. "The second wave was coding agents." Stateful, tool- using, but run on your laptop for one user with no durability. "Now we are entering the third wave: agents as infrastructure. Durable, distributed, structurally safe, and serverless… enforce security through architecture rather than behavior." The Durable Object-plus- capability-model substrate is explicitly the thesis of this bet.

  6. Preview release; experimental API surface. "Project Think is experimental. The API surface is stable but will continue to evolve in the coming days and weeks." Cloudflare is already using Think internally to build its own background- agent infrastructure. Shipping as @cloudflare/think + agents + ai + @cloudflare/shell + zod + workers-ai-provider. Example repo: github.com/cloudflare/agents/tree/main/examples/assistant. Think speaks the same WebSocket protocol as @cloudflare/ai-chat — existing AIChatAgent clients don't change.

Systems / concepts / patterns extracted

Operational numbers

  • ~1,000 tokens for Cloudflare's two-tool Code Mode MCP surface covering ~3,000 API operations vs ~1.17 million tokens for the naive tool-per-endpoint equivalent → 99.9% reduction.
  • 10,000 agents active 1% of the time: 10,000 always-on VMs / containers vs ~100 concurrent active DO instances at any moment (post's stated example).
  • Dynamic Workers: fresh V8 isolate at runtime, milliseconds to start, a few megabytes memory, ~100× faster and up to 100× more memory-efficient than a container.
  • LLM call latency frame: "An LLM call takes 30 seconds. A multi-turn agent loop can run for much longer." — motivates fibers + keepAliveWhile + hibernate-on-long-callback.
  • Context-block accounting example: "MEMORY [42%, 462/1100 tokens]" — the token budget the model sees for a context block live in its prompt.

Caveats

  • Platform-primitives article, not a retrospective. Unlike the same-day Agent Lee launch which discloses 18K DAU / 250K tool calls / day, Project Think publishes no adoption, throughput, or production-reliability numbers. "We're sharing it early so you can build alongside us."
  • Preview. "The API surface is stable but will continue to evolve in the coming days and weeks." Names likely to change (runFiber, stash, Session, SessionManager.fork, withContext, withCachedPrompt).
  • The post frames itself as additive to the existing Agents SDK. Every primitive is available as a standalone package usable with the bare Agent base class. Think just wires them together. Nothing deprecates.
  • 99.9% token-reduction number is single-vendor framing. It measures the naive alternative (every endpoint as its own tool schema) which is already a known anti-pattern (see patterns/tool-surface-minimization); the realistic hand-crafted alternative (Datadog-style toolsets + layered discovery) would not consume 1.17M tokens. Directional but not the honest comparison.
  • Fiber-vs-Workflow relationship is implicit. Cloudflare already ships Workflows for durable execution at the step level. The post doesn't explicitly contrast — runFiber() appears to be the agent-loop-scoped durable-execution primitive that lives inside the agent DO, whereas Workflows is a separate top-level orchestration tier. Wiki models them as siblings.
  • Sub-agents use Facets. Facets is a separate Cloudflare primitive (DO + Dynamic Workers) referenced by link; the Project Think post uses it as a building block without re-deriving it. Wiki page for Facets-per-se not created in this ingest; content captured as "colocated child DOs" in the sub-agents pattern.
  • Runtime-npm resolution security posture not decomposed. @cloudflare/worker-bundler fetches from the registry at runtime; typosquatting / supply-chain implications of LLM-written import statements are not discussed in the post.
  • "Self-improvement through code, not fine-tuning" is a strong claim. The wiki's read: that framing is aspirational — a single-session agent that writes one tool is a useful capability, but "genuinely more useful over time" implies cross-session extension accumulation which is a governance question (approval, revocation, auditing) the post doesn't address.

Source

  • sources/2026-04-15-cloudflare-introducing-agent-leesame- day companion post. Agent Lee is Cloudflare's first-party customer-facing agent built on today's Agents SDK; Think is the next generation the team would use to build something like Agent Lee. Together the two posts bracket Cloudflare's agent posture: here is an agent we ran in production (18K DAU) + here is the platform we're building so you can run yours.
  • sources/2026-04-20-cloudflare-internal-ai-engineering-stack — Cloudflare's internal developer-agent stack (iMARS + MCP Server Portal + AI Code Reviewer), published five days after Think. Think is positioned in this post as "we're already using it internally to build our own background agent infrastructure" — this is presumably that infrastructure.
  • sources/2026-04-13-cloudflare-building-a-cli-for-all-of-cloudflare — the cf CLI post establishing the ~3,000-ops-in-<1,000-tokens Code Mode number that the Project Think post quantifies further (1,000 vs 1.17M tokens = 99.9% reduction).
  • sources/2026-01-29-cloudflare-moltworker-self-hosted-ai-agent — Moltworker ports a self-hosted agent onto Cloudflare primitives (Workers + Sandbox SDK + Browser Rendering + AI Gateway + R2 + Zero Trust Access). Project Think is the next-generation platform framing for that class of agent — the Moltworker port is roughly Tier 3 / Tier 4 on Think's execution ladder.
  • sources/2026-02-27-cloudflare-a-better-streams-api-is-possible-for-javascript — streaming-API critique by James Snell (Workers runtime + Node.js TSC). Think's per-turn streaming + streamText primitive runs inside this substrate; Project Think doesn't touch streams design but inherits the Workers runtime choices.
Last updated · 200 distilled / 1,178 read