Designing MCP tools for agents: Lessons from building Datadog's MCP server¶
Summary¶
Datadog's retrospective on shipping its official MCP (Model Context Protocol) server — the company's first observability interface built specifically for customer AI agents rather than humans or programmatic clients. The V1 was "a thin wrapper around existing APIs" — it worked well enough to validate the idea, then failed in characteristic ways once real agents drove it. Agents filled context windows with log data and lost track of their task; they blew token budgets on variable-size records; they inferred trends from raw samples instead of aggregating. The post reframes tool design around an LLM context-window budget as the dominant operational resource, and proposes a set of patterns Datadog uses to live within it.
Key takeaways¶
-
An agent's context window is a hard scarce resource, and the tool owns how much of it each call consumes. The entire tool result ends up in context; "I can't predict how much I'll eat" is a design bug. This reframes classic ergonomic questions (output format, pagination unit, tool count, error messages) into a single discipline around context efficiency (Source: sources/2026-03-04-datadog-mcp-server-agent-tools).
-
Response format is a 2× lever without changing semantics. CSV uses about half as many tokens per record as JSON for tabular data (no nesting). YAML trims roughly 20% off JSON for nested data. Trimming rarely-used fields from the default response (and letting agents re-request them) compounded further: Datadog reports fitting ~5× more records per token budget on some tools after applying format + trim together.
-
Paginate by token budget, not by record count — Datadog log messages range from 100 bytes to 1 MB, so fixed-record-count pages have unbounded token cost; the server cuts at N tokens and returns a cursor (pattern: patterns/token-budget-pagination). This may matter less when clients like Cursor and Claude Code write long tool results to disk instead of inlining into context — "this isn't in the MCP spec yet."
-
Expose a query language, not raw retrieval. For questions like "which services are logging the most errors in the last hour?", V1 agents pulled samples of raw logs and guessed; some brute-forced until the context window filled. Datadog added SQL-style querying over observability data — agents are "quite good at writing it" — and reports ~40% cheaper eval runs in some scenarios because agents
SELECTonly the fields they need andLIMIT/COUNTinstead of retrieving. Pattern: patterns/query-language-as-agent-tool. Supporting this at Datadog scale was "a significant lift" because traditional relational databases do not apply. -
Tool count must be actively managed. Cited evidence: agents' tool-calling accuracy degrades as the tool inventory grows (arXiv 2411.15399), and each tool description consumes context window. Three tactics used together (pattern: patterns/tool-surface-minimization):
- Flexible tools — one schema covering many use cases, not one tool per API endpoint.
- Toolsets — a minimal default set + opt-in toolsets for specialized workflows; user must anticipate agent needs.
-
Layering — chained tools ("how do I do X?" → "do X") to hide specialized functionality behind a discovery tool (Block's "MCP tools like ogres with layers"); tradeoff is +1 tool call of latency per task.
-
Error messages are agent-recovery primitives. Generic
"invalid query"gets agents stuck in retry loops; specific"unknown field 'stauts' – did you mean 'status'?"gives a clear next step. Pattern: patterns/actionable-error-messages. Two adjunct moves: (a) asearch_datadog_docsRAG tool reachable from server instructions so agents look up syntax on demand instead of cramming it into tool descriptions; (b) tool responses can carry advisory guidance alongside data ("you searched forpayment, did you meanpayments?") — departure from REST API conventions where no party on the other end can reason over prose. -
General-purpose MCP server vs specialized agent is a deliberate trade-off, not a winner-takes-all. Datadog also ships Bits AI SRE, a hosted agent with a purpose- built web UI for alert investigation. The hosted agent can assume the workflow (user is investigating an alert) and pre-load related data; the MCP server can't. The stated roadmap is to expose Bits AI SRE capabilities through MCP and to broaden what the specialized agent can investigate — "over time the line between 'specialized agent' and 'MCP server with good defaults' may get blurry."
-
Future may relax some constraints. Clients are adding tool-search (Claude Code) and on-demand skills (Claude skills, Kiro Powers) so servers needn't front- load every tool. Clients writing long tool results to disk (Cursor, Claude Code) could reduce the importance of format-level token efficiency. "How skills and MCP fit together is still an open question."
Distilled principles (author's summary)¶
- Don't just wrap your APIs. Design tools around agents' constraints.
- Be frugal with context windows, and give agents the tools to be frugal too. Query languages help.
- Guide agents with good error messages and discoverable documentation.
Cross-references¶
- concepts/agent-context-window — the design resource this post is organized around; fixed-size LLM working set into which all tool outputs and descriptions must fit.
- concepts/observability — the domain; this post narrows the concepts/observability read path to an agent-facing one.
- patterns/token-budget-pagination — cut at N tokens, cursor for continuation; specific instance of concepts/agent-context-window discipline applied to pagination.
- patterns/query-language-as-agent-tool — SQL as the tool
surface; agents are strong at SQL and it gives them fine-grained
SELECT/LIMIT/COUNTcontrol over what enters context. - patterns/tool-surface-minimization — flexible tools + toolsets
- layering as three complementary ways to keep the exposed tool count within agent accuracy + context-budget limits.
- patterns/actionable-error-messages — specific, corrective errors (column name suggestions, available-options echo) as a recovery primitive for non-deterministic agents; pairs with discoverable docs and advisory notes in tool results.
- systems/datadog-mcp-server — the system described.
- systems/bits-ai-sre — the hosted-agent peer product; specialized-agent vs MCP-server design trade-off named explicitly.
- Related prior Datadog ingestions (same fleet / same engineering org, different layers of the stack): sources/2025-01-29-datadog-husky-efficient-compaction-at-datadog-scale (storage layer under observability reads), sources/2025-11-18-datadog-ebpf-fim-filtering (concepts/edge-filtering on the security side — same token- budget logic applies to a different scarce resource).
Caveats¶
- Numbers in the post are qualitative ("~5× more records in the same token budget", "~40% cheaper in some evals", "~50% token savings CSV-vs-JSON"). No absolute scale (requests/sec, token- budget distribution, query-language coverage) is disclosed.
- The SQL surface is described but the engine/semantics/indexing layer behind it is not. At Datadog scale it is implicitly built on top of (or alongside) systems/husky-class infrastructure; the post does not confirm which engine serves the agent-issued SQL.
- The post is first-person by one MCP-server engineer, not a formal postmortem or whitepaper; treat claims as design testimony.