PATTERN Cited by 2 sources
Destructive operation confirmation as agent guardrail¶
A CLI tags destructive commands in its machine-readable catalog and fails any invocation of a destructive command that doesn't carry an explicit confirmation flag. The agent can't accidentally run a destructive command — it has to emit the confirmation flag, which is a plan-time choice it has to actively make (and a human reviewer can see in the agent's trace).
When to use this pattern¶
- A CLI exposes commands whose effects are hard to reverse
(
delete,drop,force-push,reset,destroy,silence-alert,delete-slo,drop-synthetic-check). - Agents drive the CLI.
- The cost of an accidental destructive action is high enough that a zero-cost tripwire is worth the minor friction.
The verbatim canonical statement¶
From the gcx launch post:
"It will find commands that result in destructive operations, which require explicit confirmation to reduce agent mistakes."
Paired with the machine-readable catalog: the CLI knows which commands are destructive; the agent has to know that knowledge and act accordingly.
Mechanism¶
- Catalog-time tagging. Every command in the
machine-readable catalog carries a
destructive: true(or equivalent) flag for mutating commands whose effects can't be trivially undone. - Run-time gate. On invocation, if the command is tagged destructive and the confirmation flag is absent, the CLI exits non-zero with a documented error shape (concepts/exit-code-semantics) indicating confirmation-required.
- Plan-level agent response. The agent, informed by the
catalog's
destructive: true, emits the confirmation flag up-front — the destructive action becomes a plan time choice, not a silent runtime consequence. - Human-visible trace. Because the confirmation flag is on the command-line, the agent's transcript shows it — reviewers can audit destructive actions taken.
Why it's zero-cost¶
The guardrail is purely catalog + flag:
- No runtime permission model.
- No per-command custom logic.
- No prompt-engineering in the LLM.
The catalog is already the tool's authoritative surface description; tagging destructive commands is one metadata field per entry. Enforcement is a single check at invocation time.
Composition with the larger CLI-safety picture¶
| Layer | Where it sits |
|---|---|
| Agent-level prompt: "be careful" | Unreliable — prompts aren't control (concepts/structured-output-reliability) |
| Read-only-agent-tool (patterns/allowlisted-read-only-agent-actions) | Hard — narrow the tool to non-mutating commands only |
| CLI-safety-as-agent-guardrail (patterns/cli-safety-as-agent-guardrail) | Narrow — Fly.io's mutation-MCP split |
| Destructive-op confirmation (this pattern) | Zero-cost tripwire on full CLI — you can still run mutating commands but only with explicit flag |
| Human approval loop | Highest-friction — reserve for irreversible infra ops |
The patterns stack: an org can use destructive-op confirmation as the baseline layer on every CLI, layer mutation-MCP separation for agent-agent-driven flows, and reserve human approval for the highest-stakes operations.
Relationship to Fly.io's flyctl canonical¶
The Fly.io canonical provides mutation-MCP separation — mutating commands live in a separate MCP server that the agent must explicitly opt into. That's a stronger-invariant shape at the MCP-surface altitude.
The Grafana gcx approach is the direct-CLI altitude complement: the commands all live on one binary, but the catalog + flag requirement provide the same net effect as mutation-MCP-separation for agents that drive the CLI directly rather than through an MCP server. The two patterns are complementary, not alternatives.
Tradeoffs¶
- Annotation overhead. Every new destructive command has to be tagged; the cost scales with command count but is trivial per command.
- False negatives (missed tags). A mutating command that doesn't carry the tag can be run without confirmation. This is a catalog-completeness burden on the tool maintainers.
- False positives (over-tagging). Tagging every mutating command — including ones that are trivially reversible like renames — adds friction without proportional safety benefit. The "destructive" boundary is a design choice, not a mechanical one.
- Agent policy gap. The agent still decides whether to emit the confirmation flag. An agent configured to auto-confirm everything neutralises the guardrail; the zero-cost shape is necessary but not sufficient.
Seen in¶
- sources/2026-04-29-grafana-get-observability-in-the-terminal-for-you-and-your-agents-with-the-gcx-cli-tool
— canonical statement of destructive-op-confirmation as an
agent-ergonomic CLI mechanism. Grafana's
gcxtags destructive commands in its catalog and requires the confirmation flag. First observability-vendor-shipped instance of the pattern as a named mechanism. - sources/2025-05-07-flyio-provisioning-machines-using-mcps — Fly.io's mutation-MCP-separation pattern at the MCP-server altitude — same net effect from a different direction; sibling instance on the wiki of patterns/cli-safety-as-agent-guardrail.
Related¶
- concepts/agent-ergonomic-cli
- concepts/machine-readable-command-catalog
- concepts/exit-code-semantics
- concepts/blast-radius
- systems/gcx-cli
- systems/fly-flyctl
- patterns/cli-safety-as-agent-guardrail
- patterns/auto-detect-agent-context
- patterns/wrap-cli-as-mcp-server
- patterns/allowlisted-read-only-agent-actions