SYSTEM Cited by 3 sources
Bits AI SRE¶
Bits AI SRE is Datadog's hosted SRE agent — a web-UI agent purpose-built to investigate alerts and suggest remediations. Named in the systems/datadog-mcp-server design retrospective as the peer product to the general-purpose MCP server, illustrating the specialized-agent-vs-MCP-server design trade-off (Source: sources/2026-03-04-datadog-mcp-server-agent-tools).
Design trade-off (vs MCP server)¶
| Bits AI SRE | Datadog MCP Server | |
|---|---|---|
| Workflow | Fixed: alert investigation | Open-ended; agent-defined |
| UI | Purpose-built web UI | Whatever client the user has (Claude Code, Cursor, homegrown) |
| Default context | Can assume user is investigating the triggering alert → pre-load related data, offer specialized tools / UI | Must be general; can't make strong assumptions |
| Flexibility | One workflow | Pluggable anywhere MCP is spoken |
Datadog's stated roadmap direction: expose Bits AI SRE capabilities through MCP, and broaden what the specialized agent can investigate — "the line between 'specialized agent' and 'MCP server with good defaults' may get blurry".
What Bits actually does¶
Bits AI SRE reasons across metrics, logs, traces, infrastructure metadata, network telemetry, and monitor configuration to triage and remediate production incidents. Its value comes from how it chains tools together and reasons across their outputs — failures typically emerge from composition, not from single-tool calls (an agent that retrieves valid signals from multiple tools but combines them incorrectly is the canonical failure mode) (Source: sources/2026-04-07-datadog-bits-ai-sre-eval-platform).
The evaluation platform¶
Bits has a dedicated offline test harness, systems/bits-ai-sre-eval-platform — a replayable evaluation platform built because "we had no reliable way to detect" quality shifts when features improved one investigation class while quietly regressing another. It turns production issues into reproducible investigation environments with two components: a curated label set defining representative scenarios, and an orchestration layer that runs Bits against them at scale.
Key loops that tie product → evaluation:
- patterns/product-feedback-to-eval-labels — every user thumbs-up/-down + free-text feedback becomes a candidate evaluation label. Label creation rate grew by an order of magnitude over manual labelling.
- patterns/agent-assisted-label-validation — Bits itself
now assists with validating its own labels (alignment studies
with human judges gate trust). Validation time per label dropped
95% in one week once this came online.
- concepts/trajectory-evaluation + concepts/pass-at-k — the platform scores not just final answers but investigation depth and k-attempt capability vs. reliability.
The platform now extends beyond Bits: Datadog APM, Database Monitoring, and other agentic products bootstrap on the same label-collection + eval infrastructure from day one.
Seen in¶
- sources/2026-03-04-datadog-mcp-server-agent-tools — used as the named counter-example to a general-purpose MCP server; the design trade-off between the two surfaces is the post's explicit framing on this axis.
- sources/2026-04-07-datadog-bits-ai-sre-eval-platform — the offline evaluation platform built around Bits. Canonical source for how Datadog quantifies Bits' behaviour and iterates on it.
- sources/2026-03-18-aws-ai-powered-event-response-for-amazon-eks — cross-reference: AWS DevOps Agent ships the same category of product (hosted agent for live-telemetry incident investigation) with a different vendor relationship (AWS managed service scoped to AWS cloud resources vs Datadog SaaS scoped to Datadog-ingested telemetry). See systems/aws-devops-agent for the explicit side-by-side comparison table.