Skip to content

CLOUDFLARE 2026-04-20 Tier 1

Read original ↗

Cloudflare: The AI Engineering Stack We Built Internally

Summary

Cloudflare describes the internal AI engineering stack that reached 93% R&D adoption (3,683 users, 47.95M AI requests in 30 days) in 11 months, built entirely on their own shipping products. Three architectural layers: a platform layer (Cloudflare Access + AI Gateway + Workers AI + a single proxy Worker that clients authenticate to via .well-known discovery); a knowledge layer (self-hosted Backstage service catalog with 16K+ entities exposed through MCP, plus auto-generated AGENTS.md files in ~3,900 repos); and an enforcement layer (multi-agent AI Code Reviewer that runs as a GitLab CI component and cites Engineering Codex rule IDs in MR comments). Doubles as a reference architecture for enterprise MCP at scale.

Key takeaways

  1. Single proxy Worker as choke point from day one. Clients never talk to AI Gateway directly — every LLM request flows through a Hono Worker that validates the Cloudflare Access JWT, strips client auth, injects the real provider API key server-side (cf-aig-authorization), and tags requests with an anonymous per-user UUID (cf-aig-metadata) resolved from D1 + KV. No API keys on user laptops; per-user cost attribution without leaking identities to providers.
  2. Discovery-endpoint config bootstrap. One command (opencode auth login https://opencode.internal.domain) hits /.well-known/opencode, returns an auth block (telling the client to run cloudflared access login) plus a config block with providers, MCP servers, agents, commands, and default permissions. Org-wide updates ship via wrangler deploy — no client reconfiguration.
  3. Model-catalog freshness via cron. Hourly Workers Cron fetches the OpenAI model list from models.dev, caches in KV, and injects store: false (Zero Data Retention) on every model so new models inherit ZDR automatically.
  4. code-mode at the MCP portal layer. 34 GitLab MCP tools consumed ~15K context tokens (~7.5% of a 200K window) before the model even saw the prompt. Portal-level Code Mode collapses all upstream tools behind two meta-tools (portal_codemode_search, portal_codemode_execute) so the client sees a constant 2-tool surface regardless of how many servers are wired up. Scales cleanly as the fleet grows.
  5. [[agents-md]] gives agents repo-local context. Short, high-signal markdown file per repo listing runtime, test/lint commands, directory layout, conventions (citing Codex RFC IDs), boundaries ("don't edit gen/"), and dependency edges pulled from Backstage. Generator pipeline bootstrapped ~3,900 repos by analyzing repo structure + catalog metadata and opening an MR for team review. AI Code Reviewer flags when changes suggest AGENTS.md needs updating — stale files are worse than none.
  6. Backstage as the agent-usable knowledge graph. Self-hosted OSS Backstage tracks 2,055 services, 228 APIs, 1,302 databases, 544 systems across 45 domains, 375 teams, and dependency edges. Exposed via a 13-tool MCP server so agents can answer "who owns this service?" and "what databases does it depend on?" without leaving the session. Without structured catalog data, agents read code but can't see the system around it.
  7. Multi-agent AI Code Review as a CI component. One GitLab CI component include per repo. A coordinator classifies the MR by risk tier (trivial / lite / full) and delegates to specialized agents (code quality, security, Codex compliance, docs, performance, release impact), each pulling Codex rules from a central repo and AGENTS.md from the target repo. Findings are categorized (Security / Code Quality / …) with severities (Critical / Important / Suggestion / Nits) and cite specific Codex rule IDs. Stateful across review rounds — won't re-raise fixed issues. A Workers-based config service picks the model per reviewer agent so routing changes don't touch the CI template.
  8. Workers AI for cost-sensitive at-scale inference. 91% of requests go to frontier labs (OpenAI/Anthropic/Google) for complex agentic work; 9% (~1.3M/month) to Workers AI. A single internal security agent processes

    7B tokens/day on Kimi K2.5 (256K context, tool calling, structured outputs) — claimed 77% cheaper than a mid-tier proprietary model (~$2.4M/yr saved). Same-network inference co-located with Workers/DO/ storage avoids cross-cloud latency. Workers AI handles ~15% of Code Reviewer traffic, mostly doc review.

Systems / concepts / patterns extracted

  • Systems: AI Gateway, Workers AI, MCP Server Portal, Cloudflare Access, Workers (proxy + cron), Durable Objects, Workers KV, D1, Dynamic Workers (sandboxed agent-generated code exec), Agents SDK (McpAgent), Sandbox SDK, Workflows, Backstage, OpenCode, Windsurf, GitLab self-hosted.
  • Concepts: Code Mode (tools-as-code vs tools-as-schema), AGENTS.md (repo-local agent context), progressive disclosure of skills, Zero Data Retention, anonymous-UUID user attribution, BYOK LLM routing, Engineering Codex (rules-as-skill with RFC IDs), portal-level tool aggregation.
  • Patterns: single proxy / central choke point, .well-known discovery endpoint for client bootstrap, config-as-code compiled to JSON from markdown+YAML, multi-agent review coordinator with risk-tier routing, stateless CI runner + stateful config service split, MCP monorepo with shared OAuth (workers-oauth-provider) + Access identity.

Operational numbers (30-day window, Feb–Apr 2026)

  • 3,683 active internal users (60% company, 93% R&D) out of ~6,100.
  • 47.95M AI requests; 295 teams active.
  • AI Gateway: 20.18M req/month, 241.37B tokens; 91.16% frontier labs, 8.84% Workers AI.
  • Workers AI: 51.47B input + 361.12M output tokens.
  • OpenCode AI Gateway view: 688K req/day, 10.57B tokens/day across 4 providers.
  • MR throughput: 4-week rolling average climbed from ~5.6K/wk to >8.7K/wk; peak week 10,952 (~2× Q4 baseline).
  • Backstage catalog: 2,055 services, 167 libs, 122 packages, 228 APIs, 544 systems, 45 domains, 1,302 databases, 277 ClickHouse tables, 173 clusters, 375 teams, 6,389 users.
  • MCP portal: 13 servers, 182+ tools (Backstage, GitLab, Jira, Sentry, Elasticsearch, Prometheus, Google Workspace, Release Manager, …).
  • AGENTS.md generator: ~3,900 repos processed.

Caveats

  • Partly a product-marketing post for Agents Week — every layer is explicitly mapped to a Cloudflare product. Treat cost/perf claims (e.g., "77% cheaper than mid-tier proprietary", $2.4M/yr) as vendor-reported.
  • Dogfooding case study: the "stack" generalizes, but specific tool choices (Workers, Access, AI Gateway) are Cloudflare-native.
  • No incident/post-mortem content — adoption metrics only.

Source

Last updated · 200 distilled / 1,178 read