Skip to content

PATTERN Cited by 3 sources

Wrap CLI as MCP server

Pattern

Expose an existing CLI as an LLM tool surface by writing a thin MCP server that:

  1. picks a small subset of CLI subcommands (typically read-only) and registers each as an MCP tool,
  2. invokes the CLI in a subprocess per tool call, passing through any LLM-supplied arguments,
  3. captures stdout (usually with --json or equivalent structured-output flag — see concepts/structured-output-reliability) and returns it verbatim as the MCP tool response,
  4. uses MCP stdio transport so the MCP server itself is launched by the client (Claude Desktop, Claude Code, Cursor, Goose, …) as a subprocess, no HTTP/auth layer required,
  5. inherits the operator's existing CLI credentials — whatever's already in ~/.config/<cli> / env / the CLI's login state becomes the auth boundary.

The canonical wiki instance is Fly.io's flymcp"the most basic MCP server for flyctl", ~90 lines of Go (using mark3labs/mcp-go), built in ~30 minutes, exposing exactly two flyctl subcommands: fly logs + fly status. (Source: sources/2025-04-10-flyio-30-minutes-with-mcp-and-flyctl.)

Why it's viable

Three things converge to make this pattern a one-afternoon move rather than a quarter-long project:

  • Mature MCP libraries exist per language. mark3labs/mcp-go, Python MCP SDK, TypeScript MCP SDK — each handles the protocol plumbing so the CLI-wrap author just declares tools + handlers.
  • Stdio transport eliminates the distributed-systems burden. No session affinity (concepts/mcp-long-lived-sse), no auth token exchange, no rate-limiting tier, no multitenancy. The operator's desktop MCP client launches the wrapper as a subprocess; when the client exits, the wrapper exits.
  • JSON-mode-as-already-done. If the CLI has a --json or -o json flag, the structured-output problem is solved. See concepts/structured-output-reliability. Fly.io's 2020 decision to give most flyctl commands a --json mode was load-bearing for flymcp five years later — "I don't know how much of a difference it made" (it made all the difference).
  • LLM planners compose read-only observability primitives well. Two tools (logs + status) turned out to be enough to reproduce an experienced SRE's incident-diagnosis flow against a Fly-hosted CDN app.

Canonical instance — flymcp + unpkg

Claude, given only fly logs and fly status as tools, produced without further prompting:

  • the global topology of unpkg (10 Fly Machines across 11 regions: lax, atl, ewr, lhr, cdg, ams, sin, nrt, hkg, bog, syd),
  • criticality classification of 2 machines in non-healthy status ("context deadline exceeded", "gone"),
  • oom_killed event correlation across multiple machines,
  • on the follow-up prompt "try getting logs for one of the critical machines", a per-second incident timeline from OOM kill → SIGKILL → reboot → health-check fail → listener up → health-check pass, ~43 seconds end-to-end,
  • the specific kernel OOM line with RSS + process numbers: "Out of memory: Killed process 641 (bun) total-vm:85950964kB, anon-rss:3744352kB, …" and the memory ceiling diagnosis: Bun at ~3.7 GB of 4 GB allocated.

Ptacek's read: "annoyingly useful … faster than I find problems in apps." See concepts/agentic-troubleshooting-loop for the planner-executor loop shape this instantiates.

Pattern elements

  • Tool picker. Choose 1–5 read-only subcommands initially. Fewer tools = more accurate LLM tool selection (patterns/tool-surface-minimization) and smaller context-window footprint.
  • Read-only posture. Gate mutations behind a second tier (patterns/allowlisted-read-only-agent-actions) or leave them out entirely for v1. Blast radius of LLM hallucination should be bounded to "wrong conclusion", not "destroyed machine".
  • Structured-output flag. --json / -o json / --format json. The wrapper should pass this flag unconditionally; the LLM never sees pretty-printed human tables.
  • Subprocess-per-call. No need for a long-running CLI daemon; spawn fresh per tool invocation. Keeps the wrapper stateless and easy to reason about.
  • Pass-through credentials. Don't reinvent auth. The operator already ran flyctl auth login / aws configure / gcloud auth / kubectl config; the wrapper inherits it by inheriting env and ~/.config.

Generalisation

The pattern clearly extends beyond flyctl. Any CLI with --json mode is a candidate: kubectl, aws, gcloud, gh, doctl, linode-cli, heroku, pulumi, terraform, fastly, netlify, vercel. Fly.io doesn't claim generality in the post, but the 90-LoC-Go-wrapper shape obviously ports.

Limiting factor: the CLI's JSON output quality. Some CLIs have partial or inconsistent JSON support; some wrap everything in a single top-level blob that's hard for an LLM to navigate without a further unwrap tool; some interleave log lines into stdout alongside the JSON result. The smoother the --json, the smaller the wrapper.

Trade-offs vs alternatives

vs. OpenAPI-spec-based MCP (Cloudflare's cf-cli framing — expose API directly as MCP tools): OpenAPI gives full-surface exposure automatically but explodes the tool count and the context-window cost. Wrap-CLI gives manual selection + built-in read-only cultural default, at the cost of only covering what the CLI already exposes.

vs. Code Mode (CF Code Mode — fit thousands of operations into one tool by giving the LLM a programming environment): Code Mode is the right answer at ~3000-op scale. Wrap-CLI is the right answer at <10-op scale with a <1-hour budget.

vs. HTTP/SSE MCP server: stdio wrappers don't multitenant, don't survive the client's lifetime, and inherit the operator's full CLI credentials. For operator-driven troubleshooting this is a feature, not a bug. For shared team or CI use, HTTP/SSE with patterns/session-affinity-for-mcp-sse is necessary.

Risks

  • Local MCP server security (concepts/local-mcp-server-risk). The operator is giving a cloud LLM instance the ability to run native binaries on their workstation. Even a nominally read-only tool surface is one "let me try one more thing" prompt-injection away from misbehaviour. Ptacek's explicit caveat: "Local MCP servers are scary. I don't like that I'm giving a Claude instance in the cloud the ability to run a native program on my machine."
  • Natural mitigation: patterns/disposable-vm-for-agentic-loop — run the wrapped CLI inside a throwaway Fly Machine / Cloud Hypervisor micro-VM / Firecracker sandbox, not on the operator's laptop. The Fly.io 2025-02-07 VSCode-SSH post sketches exactly this shape.
  • LLM hallucination on novel incidents. The OOM-on-Bun case is nicely demonstrated but self-evidently well-represented in the training corpus. Accuracy on rarer failure shapes is not measured.
  • No tool-call rate limiting in stdio mode. A poorly prompted agent can spin on fly logs of different machines; nothing in the wrapper caps cost.

Seen in

  • sources/2025-04-10-flyio-30-minutes-with-mcp-and-flyctl — canonical instance (flymcp / 2 tools / 90 LoC Go / 30 min / unpkg incident-diagnosis demo).
  • sources/2025-05-07-flyio-provisioning-machines-using-mcpsmutation transition (~27 days later): the same flyctl MCP server now exposes the full fly volumes subcommand family (create / list / extend / fork / snapshots / destroy), shipped in flyctl v0.3.117. First wiki instance of the pattern crossing the read-only → production-mutation boundary. Load-bearing safety claim: CLI-level refusal invariants ("can't destroy a mounted volume") become the agent guardrail at zero cost — see patterns/cli-safety-as-agent-guardrail. Pair-post to the 2025-04-10 instance.
  • sources/2026-03-10-flyio-unfortunately-sprites-now-speak-mcpPtacek's aesthetic counter-position on the same vendor's pattern. 11 months after the original flymcp post, the 2026-03-10 "Unfortunately, Sprites Now Speak MCP" post argues that for shell-capable agents, wrapping a CLI as an MCP server is the wrong default shape"In 2026, MCP is the wrong way to extend the capabilities of an agent. The emerging Right Way to do this is command line tools and discoverable APIs." The pattern is not retracted (Fly.io ships sprites.dev/mcp in the same post), but re-positioned: wrap-CLI-as-MCP is the shape you ship when the consuming agent can't run shell. For shell-capable agents, show them the CLI directly. Two new cost axes the 2026-03-10 post names: context bloat as importance signal (concepts/context-as-importance-signal) — tool descriptions in context direct model attention, not just capability availability — and progressive capability disclosure (concepts/progressive-capability-disclosure) — CLI subcommand trees reveal capabilities incrementally, avoiding both the token cost and the importance-signal cost of a flat pre-loaded tool list. See patterns/mcp-as-fallback-for-shell-less-agents for the positional-pattern companion.

Positional recap

The pattern remains canonical for:

  • Shell-less agents (Claude Desktop, browser-UI agents, older MCP-only clients) — see patterns/mcp-as-fallback-for-shell-less-agents.
  • Cross-vendor interop — an MCP server by vendor A works for agents from B, C, D without per-vendor integration.
  • Quick prototypes where 90 lines of Go buys you a week's worth of agent-tooling demos.
  • Governed environments where central MCP servers provide audit / allowlist / rate-limit surfaces (patterns/central-proxy-choke-point).

The pattern is not the right default for shell-capable agents iterating on a project where the CLI is already in the agent's environment. Ship the CLI + a one-sentence skill; let progressive disclosure do the rest.

Last updated · 542 distilled / 1,571 read