Skip to content

PATTERN Cited by 2 sources

Wrap CLI as MCP server

Pattern

Expose an existing CLI as an LLM tool surface by writing a thin MCP server that:

  1. picks a small subset of CLI subcommands (typically read-only) and registers each as an MCP tool,
  2. invokes the CLI in a subprocess per tool call, passing through any LLM-supplied arguments,
  3. captures stdout (usually with --json or equivalent structured-output flag — see concepts/structured-output-reliability) and returns it verbatim as the MCP tool response,
  4. uses MCP stdio transport so the MCP server itself is launched by the client (Claude Desktop, Claude Code, Cursor, Goose, …) as a subprocess, no HTTP/auth layer required,
  5. inherits the operator's existing CLI credentials — whatever's already in ~/.config/<cli> / env / the CLI's login state becomes the auth boundary.

The canonical wiki instance is Fly.io's flymcp"the most basic MCP server for flyctl", ~90 lines of Go (using mark3labs/mcp-go), built in ~30 minutes, exposing exactly two flyctl subcommands: fly logs + fly status. (Source: sources/2025-04-10-flyio-30-minutes-with-mcp-and-flyctl.)

Why it's viable

Three things converge to make this pattern a one-afternoon move rather than a quarter-long project:

  • Mature MCP libraries exist per language. mark3labs/mcp-go, Python MCP SDK, TypeScript MCP SDK — each handles the protocol plumbing so the CLI-wrap author just declares tools + handlers.
  • Stdio transport eliminates the distributed-systems burden. No session affinity (concepts/mcp-long-lived-sse), no auth token exchange, no rate-limiting tier, no multitenancy. The operator's desktop MCP client launches the wrapper as a subprocess; when the client exits, the wrapper exits.
  • JSON-mode-as-already-done. If the CLI has a --json or -o json flag, the structured-output problem is solved. See concepts/structured-output-reliability. Fly.io's 2020 decision to give most flyctl commands a --json mode was load-bearing for flymcp five years later — "I don't know how much of a difference it made" (it made all the difference).
  • LLM planners compose read-only observability primitives well. Two tools (logs + status) turned out to be enough to reproduce an experienced SRE's incident-diagnosis flow against a Fly-hosted CDN app.

Canonical instance — flymcp + unpkg

Claude, given only fly logs and fly status as tools, produced without further prompting:

  • the global topology of unpkg (10 Fly Machines across 11 regions: lax, atl, ewr, lhr, cdg, ams, sin, nrt, hkg, bog, syd),
  • criticality classification of 2 machines in non-healthy status ("context deadline exceeded", "gone"),
  • oom_killed event correlation across multiple machines,
  • on the follow-up prompt "try getting logs for one of the critical machines", a per-second incident timeline from OOM kill → SIGKILL → reboot → health-check fail → listener up → health-check pass, ~43 seconds end-to-end,
  • the specific kernel OOM line with RSS + process numbers: "Out of memory: Killed process 641 (bun) total-vm:85950964kB, anon-rss:3744352kB, …" and the memory ceiling diagnosis: Bun at ~3.7 GB of 4 GB allocated.

Ptacek's read: "annoyingly useful … faster than I find problems in apps." See concepts/agentic-troubleshooting-loop for the planner-executor loop shape this instantiates.

Pattern elements

  • Tool picker. Choose 1–5 read-only subcommands initially. Fewer tools = more accurate LLM tool selection (patterns/tool-surface-minimization) and smaller context-window footprint.
  • Read-only posture. Gate mutations behind a second tier (patterns/allowlisted-read-only-agent-actions) or leave them out entirely for v1. Blast radius of LLM hallucination should be bounded to "wrong conclusion", not "destroyed machine".
  • Structured-output flag. --json / -o json / --format json. The wrapper should pass this flag unconditionally; the LLM never sees pretty-printed human tables.
  • Subprocess-per-call. No need for a long-running CLI daemon; spawn fresh per tool invocation. Keeps the wrapper stateless and easy to reason about.
  • Pass-through credentials. Don't reinvent auth. The operator already ran flyctl auth login / aws configure / gcloud auth / kubectl config; the wrapper inherits it by inheriting env and ~/.config.

Generalisation

The pattern clearly extends beyond flyctl. Any CLI with --json mode is a candidate: kubectl, aws, gcloud, gh, doctl, linode-cli, heroku, pulumi, terraform, fastly, netlify, vercel. Fly.io doesn't claim generality in the post, but the 90-LoC-Go-wrapper shape obviously ports.

Limiting factor: the CLI's JSON output quality. Some CLIs have partial or inconsistent JSON support; some wrap everything in a single top-level blob that's hard for an LLM to navigate without a further unwrap tool; some interleave log lines into stdout alongside the JSON result. The smoother the --json, the smaller the wrapper.

Trade-offs vs alternatives

vs. OpenAPI-spec-based MCP (Cloudflare's cf-cli framing — expose API directly as MCP tools): OpenAPI gives full-surface exposure automatically but explodes the tool count and the context-window cost. Wrap-CLI gives manual selection + built-in read-only cultural default, at the cost of only covering what the CLI already exposes.

vs. Code Mode (CF Code Mode — fit thousands of operations into one tool by giving the LLM a programming environment): Code Mode is the right answer at ~3000-op scale. Wrap-CLI is the right answer at <10-op scale with a <1-hour budget.

vs. HTTP/SSE MCP server: stdio wrappers don't multitenant, don't survive the client's lifetime, and inherit the operator's full CLI credentials. For operator-driven troubleshooting this is a feature, not a bug. For shared team or CI use, HTTP/SSE with patterns/session-affinity-for-mcp-sse is necessary.

Risks

  • Local MCP server security (concepts/local-mcp-server-risk). The operator is giving a cloud LLM instance the ability to run native binaries on their workstation. Even a nominally read-only tool surface is one "let me try one more thing" prompt-injection away from misbehaviour. Ptacek's explicit caveat: "Local MCP servers are scary. I don't like that I'm giving a Claude instance in the cloud the ability to run a native program on my machine."
  • Natural mitigation: patterns/disposable-vm-for-agentic-loop — run the wrapped CLI inside a throwaway Fly Machine / Cloud Hypervisor micro-VM / Firecracker sandbox, not on the operator's laptop. The Fly.io 2025-02-07 VSCode-SSH post sketches exactly this shape.
  • LLM hallucination on novel incidents. The OOM-on-Bun case is nicely demonstrated but self-evidently well-represented in the training corpus. Accuracy on rarer failure shapes is not measured.
  • No tool-call rate limiting in stdio mode. A poorly prompted agent can spin on fly logs of different machines; nothing in the wrapper caps cost.

Seen in

Last updated · 200 distilled / 1,178 read