CONCEPT Cited by 1 source
Execution ladder¶
Definition¶
The execution ladder is a tiered-environment framing for agent code execution in which each tier adds capability on top of the previous one, and the agent must be useful at the bottom tier alone. The developer (or the agent itself) escalates the execution environment only when the current tier genuinely doesn't cover the task; each escalation adds a specific capability in exchange for operational cost / complexity.
The canonical wiki framing is from Cloudflare's 2026-04-15 Project Think launch:
"This capability model leads naturally to a spectrum of compute environments, an execution ladder that the agent escalates through as needed… The key design principle: the agent should be useful at Tier 0 alone, where each tier is additive." (Source: Project Think post.)
The five tiers (Cloudflare's realisation)¶
| Tier | Capability added | Substrate |
|---|---|---|
| 0 | Durable virtual filesystem: read, write, edit, search, grep, diff | DO SQLite + R2, @cloudflare/shell |
| 1 | LLM-generated JavaScript in sandboxed isolate, no network | Dynamic Workers + @cloudflare/codemode |
| 2 | + npm at runtime (import { z } from "zod" just works) |
@cloudflare/worker-bundler + esbuild |
| 3 | + headless browser: navigate, click, extract, screenshot | Browser Rendering |
| 4 | + full OS sandbox: git clone, npm test, cargo build; synced with Tier-0 workspace |
Cloudflare Sandbox |
The crucial property: each row adds something, never subtracts. An agent written against Tier 0 keeps working when Tier 4 is also available — it just uses more of the ladder when the task warrants.
Why additive, not alternative¶
Many platforms ship "choose your environment" menus — Lambda or EC2 or Fargate or ECS, with the choice made up front and difficult to change later. The execution-ladder framing says: an agent's single session may touch multiple tiers, escalating only when the current one doesn't cover the step.
- A chat turn that reads + edits a few files → Tier 0 alone.
- A turn that needs to parse JSON + do pandas-style aggregation → Tier 1 (JS code, no network).
- A turn that calls an API → Tier 2 + an explicitly-bound network capability (see concepts/capability-based-sandbox).
- A turn that scrapes a site without API → Tier 3.
- A turn that runs the user's test suite → Tier 4.
All within the same agent session, with the Tier-0 workspace filesystem shared across tiers.
Design principles¶
-
Bottom tier must be genuinely useful. "The agent should be useful at Tier 0 alone." A ladder whose first rung is unreachable fails the gradualism premise.
-
Escalation is agent-driven (or developer-driven), not platform-mandated. The agent decides when it needs a higher tier — ideally after the model plans the turn rather than eagerly.
-
Each rung is an explicit capability, granted. Fits the capability-based sandbox posture: adding
npmis a binding, adding Browser Rendering is a binding, adding a Sandbox is a binding. The developer or the agent's extension manifest declares what is granted. -
Shared state between tiers. Project Think's Tier-0 workspace (SQLite + R2) is reachable from every other tier; Tier 4 (full Sandbox) is "bidirectionally synced with the Workspace." Forward progress on a multi-tier task isn't lost to tier-switching.
-
Capability adds are auditable. Granting Tier 4 is a discrete decision —
createSandboxToolsvs not. The ladder's shape makes it obvious when an agent is running with more authority than needed.
Relationship to capability granularity¶
The ladder is the vertical axis (what kinds of capabilities).
Within each tier, capability granularity (the horizontal axis)
determines how small the radius of a granted capability is. Tier 2
adds "npm import capability" in general, but a specific extension
manifest may pin the allowed package list; Tier 3 adds "headless
browser" in general, but a binding scopes which origins can be
navigated. The two axes compose.
When the ladder pattern fits¶
- Agent platforms where the cost / complexity / attack surface varies dramatically across execution environments.
- Products that want to claim "useful at the minimum tier" for user-trust reasons (i.e., the user can audit the minimum agent without also auditing the container sandbox).
- Multi-tenant compute where capability escalation crosses a billing / policy boundary — cheap tiers for every user, expensive tiers opt-in.
When it doesn't¶
- If the bottom tier doesn't cover >80% of the expected agent turns, the "useful at Tier 0" principle fails and the agent effectively always runs at the top tier — collapses to a traditional "one environment" shape.
- If the tiers aren't genuinely additive (tier-switching loses state) the escalation cost is prohibitive and agents will default to the highest tier up front to avoid it.
Seen in¶
- sources/2026-04-15-cloudflare-project-think-building-the-next-generation-of-ai-agents — canonical wiki instance; names the ladder as a concept, gives the five-tier realisation, states the "useful at Tier 0" principle as load-bearing.
Related¶
- systems/project-think — the SDK that wires the ladder
together with a single
getTools()return. - systems/dynamic-workers — Tiers 1-3 substrate.
- systems/cloudflare-browser-rendering — Tier 3 capability.
- systems/cloudflare-sandbox-sdk — Tier 4 capability.
- systems/cloudflare-r2 / systems/cloudflare-durable-objects — Tier 0 substrate.
- concepts/capability-based-sandbox — the posture each rung enforces; ladder is the structured escalation of granted capabilities.
- patterns/additive-capability-ladder — the pattern-page treatment of this same shape as a reusable design discipline.
- companies/cloudflare — operator.