PATTERN Cited by 1 source

CLI safety as agent guardrail¶

Pattern¶

When wrapping a CLI as an MCP server (patterns/wrap-cli-as-mcp-server) and exposing mutating operations to an LLM agent, rely on the CLI's existing human-operator refusal invariants as the authorization boundary — instead of building an agent-specific policy layer at the MCP tier. The CLI already knows how to refuse unsafe operations ("can't destroy a mounted volume," "can't delete a non-empty bucket," "can't modify a resource that's locked," "can't downscale below minimum replicas during an outage"). Those refusals were authored to protect a human from fat-fingering in a terminal; they protect the agent user identically.

Canonical wiki statement¶

Sam Ruby, Fly.io, 2025-05-07:

"Since this support is built on flyctl, I would have received an error had I tried to destroy a volume that is currently mounted. Knowing that gave me the confidence to try the command."

(Source: sources/2025-05-07-flyio-provisioning-machines-using-mcps.)

Load-bearing: "Knowing that gave me the confidence to try the command." The CLI's pre-existing invariant is what let Ruby expose fly volumes destroy via MCP without bolting on an MCP- tier confirmation layer.

Why it's a valuable shape¶

Three properties of CLIs make this pattern cheap:

Refusals are already implemented. Every production CLI has years of accumulated "don't let the operator shoot themselves in the foot" checks — typically invariants that Support / SRE escalations taught the CLI team to enforce. These invariants are exactly the ones an agent user also needs.
Refusals are authored by humans who understood the domain. An MCP-tier guardrail author would need to re-derive them — "can a volume be destroyed while mounted? I think not, let me check" — duplicating work.
Refusals are enforced at the right layer. The CLI sits between the agent and the cloud API. Even if the MCP server is compromised / prompt-injected / misconfigured, the CLI's invariants still hold because they're checks the underlying API exercise through flags and state. The guardrail is below the wrapper, not embedded in it.

Paired with structured-output reliability¶

This pattern is the mutation-side twin of concepts/structured-output-reliability — the read-side observation that "our 2020 decision to give flyctl --json mode became load-bearing for MCP in 2025." The mutation-side mirror is "our CLI's decade-old refusal-to-destroy-mounted- volumes invariant becomes load-bearing for mutation-authority MCP in 2025." Both are cases where mature CLI design pays an AI-integration dividend the original authors never intended.

Pattern elements¶

Inherit the CLI's refusal logic unchanged. The MCP wrapper shells out to the CLI; exit code + stderr carry the refusal back to the agent; the agent reports it to the human.
Surface refusals as tool-call failures, not silently retried errors. The MCP server should not try to "work around" an invariant (e.g. "let me umount then retry destroy") — that collapses the safety property. A refusal is a signal the agent should report to the human.
Don't add MCP-tier --force flags. Resist the urge to expose bypasses. The invariant's whole value is that it holds under all callers; adding a force flag at the MCP tier reintroduces the failure mode the CLI invariant prevents.
Rely on the CLI's pre-confirmation prompts sparingly. Some CLIs implement interactive confirmation ("are you sure? [y/n]") that an MCP subprocess can't satisfy without a flag (-y, --confirm, --no-prompt). The wrapper needs a policy on whether to pass the confirmation flag — passing it collapses the confirmation gate; not passing it breaks the tool. Better: prefer CLIs where safety is invariant-based (exit-with-error) not prompt-based (ask- human-at-stdin).

What this pattern does NOT cover¶

The flyctl-level "can't destroy a mounted volume" invariant answers the question "is this operation safe right now?" — not the question "is this what the user actually intended?" A prompt injection that redirects the agent to destroy the wrong unattached volume still succeeds; the CLI invariant doesn't know which volume the user meant.

Intent-confirmation is a different layer: - patterns/plan-then-apply-agent-provisioning — present the mutation plan first, gate on human approval. - concepts/elicitation-gate — per-tool-call approval dialogue. - patterns/allowlisted-read-only-agent-actions — drop mutations entirely and leave the read-only surface.

CLIs designed to be both human-ergonomic and agent-ergonomic (concepts/agent-ergonomic-cli) are the natural substrate for this pattern. Cloudflare's 2026-04 cf CLI is explicitly designed with agent ergonomics as a primary concern; Fly.io's flyctl arrived there by accident (2020 --json decision + pre-existing refusal invariants).
The pattern is not sufficient as a sole safety mechanism; the mutation-MCP posture still carries the workstation-local credential-inheritance risk, and any invariant-gap is a direct attack surface.

Seen in¶

sources/2025-05-07-flyio-provisioning-machines-using-mcps — canonical wiki instance; flyctl's mounted-volume refusal as the guardrail that let Fly.io ship a mutation-authority MCP surface without bolting on an MCP-tier confirmation layer.

systems/fly-flyctl — the CLI whose invariants are load-bearing.
systems/model-context-protocol — the transport.
systems/fly-volumes — the resource family this protects.
concepts/structured-output-reliability — the read-side twin of this mutation-side pattern.
concepts/local-mcp-server-risk — the security posture this pattern partially mitigates.
concepts/natural-language-infrastructure-provisioning — the parent UX posture.
concepts/blast-radius — the framing vocabulary.
patterns/wrap-cli-as-mcp-server — the parent pattern this complements.
patterns/plan-then-apply-agent-provisioning — the intent-confirmation-layer complement.
patterns/allowlisted-read-only-agent-actions — the alternative posture when no CLI-level invariants exist.
companies/flyio.