PATTERN Cited by 1 source
CLI safety as agent guardrail¶
Pattern¶
When wrapping a CLI as an MCP server (patterns/wrap-cli-as-mcp-server) and exposing mutating operations to an LLM agent, rely on the CLI's existing human-operator refusal invariants as the authorization boundary — instead of building an agent-specific policy layer at the MCP tier. The CLI already knows how to refuse unsafe operations ("can't destroy a mounted volume," "can't delete a non-empty bucket," "can't modify a resource that's locked," "can't downscale below minimum replicas during an outage"). Those refusals were authored to protect a human from fat-fingering in a terminal; they protect the agent user identically.
Canonical wiki statement¶
Sam Ruby, Fly.io, 2025-05-07:
"Since this support is built on flyctl, I would have received an error had I tried to destroy a volume that is currently mounted. Knowing that gave me the confidence to try the command."
(Source: sources/2025-05-07-flyio-provisioning-machines-using-mcps.)
Load-bearing: "Knowing that gave me the confidence to try the
command." The CLI's pre-existing invariant is what let Ruby
expose fly volumes destroy via MCP without bolting on an MCP-
tier confirmation layer.
Why it's a valuable shape¶
Three properties of CLIs make this pattern cheap:
- Refusals are already implemented. Every production CLI has years of accumulated "don't let the operator shoot themselves in the foot" checks — typically invariants that Support / SRE escalations taught the CLI team to enforce. These invariants are exactly the ones an agent user also needs.
- Refusals are authored by humans who understood the domain. An MCP-tier guardrail author would need to re-derive them — "can a volume be destroyed while mounted? I think not, let me check" — duplicating work.
- Refusals are enforced at the right layer. The CLI sits between the agent and the cloud API. Even if the MCP server is compromised / prompt-injected / misconfigured, the CLI's invariants still hold because they're checks the underlying API exercise through flags and state. The guardrail is below the wrapper, not embedded in it.
Paired with structured-output reliability¶
This pattern is the mutation-side twin of concepts/structured-output-reliability — the read-side observation that "our 2020 decision to give flyctl --json mode became load-bearing for MCP in 2025." The mutation-side mirror is "our CLI's decade-old refusal-to-destroy-mounted- volumes invariant becomes load-bearing for mutation-authority MCP in 2025." Both are cases where mature CLI design pays an AI-integration dividend the original authors never intended.
Pattern elements¶
- Inherit the CLI's refusal logic unchanged. The MCP wrapper shells out to the CLI; exit code + stderr carry the refusal back to the agent; the agent reports it to the human.
- Surface refusals as tool-call failures, not silently retried errors. The MCP server should not try to "work around" an invariant (e.g. "let me umount then retry destroy") — that collapses the safety property. A refusal is a signal the agent should report to the human.
- Don't add MCP-tier
--forceflags. Resist the urge to expose bypasses. The invariant's whole value is that it holds under all callers; adding a force flag at the MCP tier reintroduces the failure mode the CLI invariant prevents. - Rely on the CLI's pre-confirmation prompts sparingly.
Some CLIs implement interactive confirmation ("are you
sure? [y/n]") that an MCP subprocess can't satisfy without
a flag (
-y,--confirm,--no-prompt). The wrapper needs a policy on whether to pass the confirmation flag — passing it collapses the confirmation gate; not passing it breaks the tool. Better: prefer CLIs where safety is invariant-based (exit-with-error) not prompt-based (ask- human-at-stdin).
What this pattern does NOT cover¶
The flyctl-level "can't destroy a mounted volume" invariant answers the question "is this operation safe right now?" — not the question "is this what the user actually intended?" A prompt injection that redirects the agent to destroy the wrong unattached volume still succeeds; the CLI invariant doesn't know which volume the user meant.
Intent-confirmation is a different layer: - patterns/plan-then-apply-agent-provisioning — present the mutation plan first, gate on human approval. - concepts/elicitation-gate — per-tool-call approval dialogue. - patterns/allowlisted-read-only-agent-actions — drop mutations entirely and leave the read-only surface.
Related wiki framing¶
- CLIs designed to be both human-ergonomic and
agent-ergonomic (concepts/agent-ergonomic-cli) are
the natural substrate for this pattern. Cloudflare's 2026-04
cfCLI is explicitly designed with agent ergonomics as a primary concern; Fly.io'sflyctlarrived there by accident (2020--jsondecision + pre-existing refusal invariants). - The pattern is not sufficient as a sole safety mechanism; the mutation-MCP posture still carries the workstation-local credential-inheritance risk, and any invariant-gap is a direct attack surface.
Seen in¶
- sources/2025-05-07-flyio-provisioning-machines-using-mcps — canonical wiki instance; flyctl's mounted-volume refusal as the guardrail that let Fly.io ship a mutation-authority MCP surface without bolting on an MCP-tier confirmation layer.
Related¶
- systems/fly-flyctl — the CLI whose invariants are load-bearing.
- systems/model-context-protocol — the transport.
- systems/fly-volumes — the resource family this protects.
- concepts/structured-output-reliability — the read-side twin of this mutation-side pattern.
- concepts/local-mcp-server-risk — the security posture this pattern partially mitigates.
- concepts/natural-language-infrastructure-provisioning — the parent UX posture.
- concepts/blast-radius — the framing vocabulary.
- patterns/wrap-cli-as-mcp-server — the parent pattern this complements.
- patterns/plan-then-apply-agent-provisioning — the intent-confirmation-layer complement.
- patterns/allowlisted-read-only-agent-actions — the alternative posture when no CLI-level invariants exist.
- companies/flyio.