Skip to content

PATTERN Cited by 1 source

Plan-then-apply agent provisioning

Pattern

When an LLM-driven agent is wired to a cloud control plane through MCP tools, gate mutations behind a plan-then-apply flow — the agent first produces a concrete plan of the resources it intends to create / modify / destroy, surfaces it to the human for review, accepts natural-language adjustments, and only then executes on approval. On failure mid-apply, the agent examines logs and proposes next steps rather than silently retrying.

This is the Terraform plan / apply discipline reimplemented as a conversation. The human's approval surface is a dialogue turn ("Make it so" or an equivalent affirmative), not a CLI command.

Canonical wiki statement

Sam Ruby, Fly.io, 2025-05-07:

"Imagine a future where you say to your favorite LLM 'launch my application on Fly.io', and your code is scanned and you are presented with a plan containing details such as regions, databases, s3 storage, memory, machines, and the like. You are given the opportunity to adjust the plan and, when ready, say 'Make it so'."

"For many, the next thing you will see is that your application is up and running. For others, there may be a build error, a secret you need to set, a database that needs to be seeded, or any number of other mundane reasons why your application didn't work the first time. Not to fear, your LLM will be able to examine your logs and will not only make suggestions as to next steps, but will be in a position to take actions on your behalf should you wish it to do so."

(Source: sources/2025-05-07-flyio-provisioning-machines-using-mcps.)

Pattern elements

  1. Code scan / intent acquisition. The agent reads the operator's application (code, Dockerfile, fly.toml, package manifest) to infer resource requirements.
  2. Plan production. The agent emits a structured plan: regions, compute size, databases, object-storage buckets, secrets to be set, networks, DNS records.
  3. Plan presentation. The plan is shown in the MCP client's UI (Claude Desktop, Cursor, Claude Code, etc.). The structured-data version of the plan is an MCP tool return; the human-readable version is either a rendered table / list in the chat UI or a tool-call approval dialogue.
  4. Plan adjustment turn. The human modifies the plan via natural language ("use fra instead of cdg", "swap the Postgres for a managed MySQL", "don't provision object-storage yet"). The agent re-emits a revised plan.
  5. Apply gate. On "Make it so" (or equivalent), the agent executes the plan via MCP tool calls, surfacing each mutation step in the chat. CLI-level refusals (patterns/cli-safety-as-agent-guardrail) are the safety net for unsafe operations.
  6. Failure inspection loop. On non-zero exit from any step, the agent examines logs, classifies the failure (build error, missing secret, db not seeded, quota), and proposes either a correction dialogue turn or a compensating action with its own mini-plan.

Lineage

The plan-and-apply split is not new — it's the discipline Terraform has enforced since 2014 (terraform plan emits a speculative-execution diff; terraform apply commits). Ruby's framing ports it from the declarative-manifest authoring experience to the natural-language dialogue experience. The structural motivation is the same: mutations against shared state are irreversible and expensive-to-reproduce, so a reviewable intermediate representation must sit between "human expresses intent" and "infrastructure is mutated."

Agent Lee's elicitation gate (Cloudflare, 2026-04-15) is a sibling pattern on the permission-check axis: the LLM produces a mutation intent, the gate pauses for the operator's explicit "yes, do that," and only then does the credentialed-proxy boundary forward the call. Plan-then-apply is the intent-production / representation half; elicitation gate is the pause-before-execute half. In practice they're composed — the plan is the elicitation representation for complex multi-step mutations.

Canonical wiki instance — flyctl v0.3.117 (aspirational)

The 2025-05-07 Fly.io post ships the substrate — full fly volumes CRUD exposed through MCP, with flyctl-level refusal invariants doing the safety work — but not a deployed plan-then-apply UX. Ruby's "Make it so" paragraph is labeled "not science fiction" but "some assembly required"; the mechanism is an aspirational target for the post's audience, not a v0.3.117 feature.

What Ruby ships in v0.3.117 is the primitive surface — individual mutating tool calls — that a plan-then-apply UX would compose. The pattern's first canonical wiki instance must therefore be read as a roadmap declaration rather than a deployed product; a future Fly.io post is the expected upgrade.

Variants

  • Human-authored plan. The agent writes a Terraform / Pulumi / fly.toml file and asks the human to plan / apply themselves — the plan IR is the existing declarative format, the "apply" is a shell command. This is what many agents do in practice today because the substrate (Terraform) already exists.
  • Agent-executed plan with per-step approval (concepts/elicitation-gate). Each tool call pauses for operator affirmation.
  • Agent-executed plan with batch approval. One approval turn covers all mutations in the plan. Ruby's "Make it so" framing points here — the frictional cost of per-mutation approval would kill the UX for any plan with more than a handful of resources.
  • Agent-executed plan with dry-run validation step. The agent first runs the plan in a dry-run mode that simulates mutations and catches referential / quota errors without committing; only after dry-run success does it prompt for real-apply approval.

Risks

  • Plan drift. Between plan-present and plan-apply, the world may change (another operator mutates, a quota expires, a region fills). The plan's preconditions should be re-checked at apply time.
  • Plan-injection via scanned content. If the agent ingests project READMEs / dependency metadata into the plan generation, adversarial content in any of that surface (concepts/prompt-injection) can inject rogue resources into the plan. Mitigation: show every resource explicitly in the plan for review.
  • Approval-fatigue collapse. If the human approves every plan without reading (because plans are mostly correct), the gate becomes ceremonial. Same failure mode as rubber- stamped code reviews.
  • Non-idempotent primitives. Apply failure mid-plan leaves the world in a partial state. CLI-level primitives should be either idempotent or expose a rollback / delete inverse for each create operation so the failure-inspection loop has something to call.
  • Ambiguous destructive intent. "Delete the oldest unattached volume" is intelligible to the LLM but produces mutations that may be hard to reverse. The CLI-level refusal invariants (patterns/cli-safety-as-agent-guardrail) only cover "destroying a mounted volume", not "destroying the wrong one."

Trade-offs vs alternatives

vs. direct-tool-call-with-per-step-elicitation: Plan-then- apply groups mutations into a reviewable batch; per-step elicitation spreads review across N dialogue turns. Plan-then- apply is the right answer for multi-resource provisioning; per-step elicitation is the right answer for one-off admin operations (Agent Lee shape).

vs. Terraform HCL authoring: Terraform's declarative manifest IS a plan IR; the author writes it by hand. Plan-then-apply via MCP generates the equivalent IR from natural language and — in the aspirational version — displays it for review inside the chat UI. Skeptics will note that a structured IR is a better review surface than prose, which is a valid critique of the NL-first flow.

vs. imperative "just do it" agent: Cheaper in turns but uninspectable and unreviewable; mutations happen, and the human finds out after. Unacceptable for production infrastructure.

Seen in

Last updated · 200 distilled / 1,178 read