PATTERN Cited by 2 sources

Destructive operation confirmation as agent guardrail¶

A CLI tags destructive commands in its machine-readable catalog and fails any invocation of a destructive command that doesn't carry an explicit confirmation flag. The agent can't accidentally run a destructive command — it has to emit the confirmation flag, which is a plan-time choice it has to actively make (and a human reviewer can see in the agent's trace).

When to use this pattern¶

A CLI exposes commands whose effects are hard to reverse (delete, drop, force-push, reset, destroy, silence-alert, delete-slo, drop-synthetic-check).
Agents drive the CLI.
The cost of an accidental destructive action is high enough that a zero-cost tripwire is worth the minor friction.

The verbatim canonical statement¶

From the gcx launch post:

"It will find commands that result in destructive operations, which require explicit confirmation to reduce agent mistakes."

Paired with the machine-readable catalog: the CLI knows which commands are destructive; the agent has to know that knowledge and act accordingly.

Mechanism¶

Catalog-time tagging. Every command in the machine-readable catalog carries a destructive: true (or equivalent) flag for mutating commands whose effects can't be trivially undone.
Run-time gate. On invocation, if the command is tagged destructive and the confirmation flag is absent, the CLI exits non-zero with a documented error shape (concepts/exit-code-semantics) indicating confirmation-required.
Plan-level agent response. The agent, informed by the catalog's destructive: true, emits the confirmation flag up-front — the destructive action becomes a plan time choice, not a silent runtime consequence.
Human-visible trace. Because the confirmation flag is on the command-line, the agent's transcript shows it — reviewers can audit destructive actions taken.

Why it's zero-cost¶

The guardrail is purely catalog + flag:

No runtime permission model.
No per-command custom logic.
No prompt-engineering in the LLM.

The catalog is already the tool's authoritative surface description; tagging destructive commands is one metadata field per entry. Enforcement is a single check at invocation time.

Composition with the larger CLI-safety picture¶

Layer	Where it sits
Agent-level prompt: "be careful"	Unreliable — prompts aren't control (concepts/structured-output-reliability)
Read-only-agent-tool (patterns/allowlisted-read-only-agent-actions)	Hard — narrow the tool to non-mutating commands only
CLI-safety-as-agent-guardrail (patterns/cli-safety-as-agent-guardrail)	Narrow — Fly.io's mutation-MCP split
Destructive-op confirmation (this pattern)	Zero-cost tripwire on full CLI — you can still run mutating commands but only with explicit flag
Human approval loop	Highest-friction — reserve for irreversible infra ops

The patterns stack: an org can use destructive-op confirmation as the baseline layer on every CLI, layer mutation-MCP separation for agent-agent-driven flows, and reserve human approval for the highest-stakes operations.

Relationship to Fly.io's flyctl canonical¶

The Fly.io canonical provides mutation-MCP separation — mutating commands live in a separate MCP server that the agent must explicitly opt into. That's a stronger-invariant shape at the MCP-surface altitude.

The Grafana gcx approach is the direct-CLI altitude complement: the commands all live on one binary, but the catalog + flag requirement provide the same net effect as mutation-MCP-separation for agents that drive the CLI directly rather than through an MCP server. The two patterns are complementary, not alternatives.

Tradeoffs¶

Annotation overhead. Every new destructive command has to be tagged; the cost scales with command count but is trivial per command.
False negatives (missed tags). A mutating command that doesn't carry the tag can be run without confirmation. This is a catalog-completeness burden on the tool maintainers.
False positives (over-tagging). Tagging every mutating command — including ones that are trivially reversible like renames — adds friction without proportional safety benefit. The "destructive" boundary is a design choice, not a mechanical one.
Agent policy gap. The agent still decides whether to emit the confirmation flag. An agent configured to auto-confirm everything neutralises the guardrail; the zero-cost shape is necessary but not sufficient.

Seen in¶

sources/2026-04-29-grafana-get-observability-in-the-terminal-for-you-and-your-agents-with-the-gcx-cli-tool — canonical statement of destructive-op-confirmation as an agent-ergonomic CLI mechanism. Grafana's gcx tags destructive commands in its catalog and requires the confirmation flag. First observability-vendor-shipped instance of the pattern as a named mechanism.
sources/2025-05-07-flyio-provisioning-machines-using-mcps — Fly.io's mutation-MCP-separation pattern at the MCP-server altitude — same net effect from a different direction; sibling instance on the wiki of patterns/cli-safety-as-agent-guardrail.