PATTERN Cited by 4 sources
AI Gateway provider abstraction¶
AI Gateway provider abstraction is the pattern of routing all application LLM calls through a single proxy endpoint that owns provider / model selection, secret injection, retry / fallback policy, rate limiting, logging, and cost accounting — so that the application only knows "I point at the gateway" and everything that could change about provider choice is reconfigured at the gateway, not in application code or deploy pipelines.
Mechanics¶
- The application is configured with a single base URL
(
ANTHROPIC_BASE_URL/OPENAI_API_BASE/ similar) pointing at the gateway, using the provider's native API shape for the call. - The gateway authenticates the caller via a separate substrate (API-gateway key, SSO JWT, workload identity).
- On each request the gateway:
- Resolves which upstream provider/model to use (static config, per-tenant config, or a fallback chain).
- Injects the real upstream API key server-side (concepts/byok-bring-your-own-key).
- Emits logs / metrics / audit records.
- Forwards the request and streams the response back.
Why this pattern¶
- Zero-code model swaps. New model releases, price shifts, or provider availability incidents become gateway-config changes. The application never redeploys.
- Centralised observability. One view of LLM spend, latency, and error rates across heterogeneous providers — the app doesn't need to stitch provider-specific telemetry.
- Rotation and revocation centralised. Keys live in one secrets store; rotation doesn't touch the application.
- Uniform rate-limiting and policy enforcement. Per-tenant quotas, per-user quotas, token budgets all enforceable in one place.
Contrast¶
Related to patterns/middleware-worker-adapter (same concern-ownership philosophy) and patterns/protocol-compatible-drop-in-proxy (the AI-Gateway variant is usually protocol-compatible: the gateway speaks the upstream provider's API shape), but specialised to the LLM provider category where the combinatorics of {provider × model × key × quota × fallback} specifically motivate centralisation.
Seen in¶
- sources/2026-04-16-cloudflare-ai-platform-an-inference-layer-designed-for-agents —
canonical unified-catalog + unified-binding realisation.
Cloudflare's 2026-04-16 AI Platform post sharpens this pattern
along two axes: (a) the SDK surface collapses from
"many SDKs, many base URLs, one gateway" to one binding
(
env.AI.run(model_string, ...)), provider selector inside the model string — see patterns/unified-inference-binding; (b) the gateway gains two new reliability primitives — automatic provider failover (patterns/automatic-provider-failover) across providers that share a model, and buffered resumable streaming (patterns/buffered-resumable-inference-stream) that survives caller disconnects. Catalog extends to 70+ models across 12+ providers (including image, video, speech alongside text), plus BYO-model via Cog containers (patterns/byo-model-via-container). The pattern's scope widens from "LLM proxy" to "general inference broker". - sources/2026-01-29-cloudflare-moltworker-self-hosted-ai-agent —
canonical minimal-application instance. Moltbot's
LLM calls are redirected through
AI Gateway by setting
ANTHROPIC_BASE_URLalone; BYOK or Unified Billing then handles the key; model / provider fallback becomes a gateway-config operation. - sources/2026-04-20-cloudflare-internal-ai-engineering-stack — enterprise-scale instance. Every LLM request from Cloudflare's internal agent tooling flows through a single Hono Worker in front of AI Gateway; the Worker validates the Zero Trust Access JWT, strips client auth, injects real provider keys, and tags requests with anonymous per-user UUIDs. Reported scale: 20.18M requests/month, 241.37B tokens, 91% frontier labs / 9% Workers AI.
- sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway — coding-agent + MCP specialisation. Databricks' Unity AI Gateway productises the same pattern for the specific category of developer coding tools (Cursor, Codex CLI, Gemini CLI, Claude Code) and their MCP integrations. Three-pillar framing named in the post (centralised audit + single bill + Lakehouse observability) — concepts/centralized-ai-governance. Extends the pattern along two axes: coding-tool clients as first-class citizens, and MCP-server governance as a peer concern to LLM-call governance. Pairs with new sibling patterns patterns/unified-billing-across-providers and patterns/telemetry-to-lakehouse.
Related¶
- systems/cloudflare-ai-gateway — the Cloudflare instance of this pattern.
- concepts/byok-bring-your-own-key — the secrets posture this pattern relies on.
- patterns/middleware-worker-adapter — the broader Worker-as- ownership-boundary pattern AI Gateway integrations usually pair with.
- patterns/protocol-compatible-drop-in-proxy — the general protocol-preserving proxy pattern this specialises.
- companies/cloudflare — operator of the canonical instance.