PATTERN Cited by 5 sources

AI Gateway provider abstraction¶

AI Gateway provider abstraction is the pattern of routing all application LLM calls through a single proxy endpoint that owns provider / model selection, secret injection, retry / fallback policy, rate limiting, logging, and cost accounting — so that the application only knows "I point at the gateway" and everything that could change about provider choice is reconfigured at the gateway, not in application code or deploy pipelines.

Mechanics¶

The application is configured with a single base URL (ANTHROPIC_BASE_URL / OPENAI_API_BASE / similar) pointing at the gateway, using the provider's native API shape for the call.
The gateway authenticates the caller via a separate substrate (API-gateway key, SSO JWT, workload identity).
On each request the gateway:
Resolves which upstream provider/model to use (static config, per-tenant config, or a fallback chain).
Injects the real upstream API key server-side (concepts/byok-bring-your-own-key).
Emits logs / metrics / audit records.
Forwards the request and streams the response back.

Why this pattern¶

Zero-code model swaps. New model releases, price shifts, or provider availability incidents become gateway-config changes. The application never redeploys.
Centralised observability. One view of LLM spend, latency, and error rates across heterogeneous providers — the app doesn't need to stitch provider-specific telemetry.
Rotation and revocation centralised. Keys live in one secrets store; rotation doesn't touch the application.
Uniform rate-limiting and policy enforcement. Per-tenant quotas, per-user quotas, token budgets all enforceable in one place.

Contrast¶

Related to patterns/middleware-worker-adapter (same concern-ownership philosophy) and patterns/protocol-compatible-drop-in-proxy (the AI-Gateway variant is usually protocol-compatible: the gateway speaks the upstream provider's API shape), but specialised to the LLM provider category where the combinatorics of {provider × model × key × quota × fallback} specifically motivate centralisation.

Seen in¶

sources/2026-04-16-cloudflare-ai-platform-an-inference-layer-designed-for-agents — canonical unified-catalog + unified-binding realisation. Cloudflare's 2026-04-16 AI Platform post sharpens this pattern along two axes: (a) the SDK surface collapses from "many SDKs, many base URLs, one gateway" to one binding (env.AI.run(model_string, ...)), provider selector inside the model string — see patterns/unified-inference-binding; (b) the gateway gains two new reliability primitives — automatic provider failover (patterns/automatic-provider-failover) across providers that share a model, and buffered resumable streaming (patterns/buffered-resumable-inference-stream) that survives caller disconnects. Catalog extends to 70+ models across 12+ providers (including image, video, speech alongside text), plus BYO-model via Cog containers (patterns/byo-model-via-container). The pattern's scope widens from "LLM proxy" to "general inference broker".
sources/2026-01-29-cloudflare-moltworker-self-hosted-ai-agent — canonical minimal-application instance. Moltbot's LLM calls are redirected through AI Gateway by setting ANTHROPIC_BASE_URL alone; BYOK or Unified Billing then handles the key; model / provider fallback becomes a gateway-config operation.
sources/2026-04-20-cloudflare-internal-ai-engineering-stack — enterprise-scale instance. Every LLM request from Cloudflare's internal agent tooling flows through a single Hono Worker in front of AI Gateway; the Worker validates the Zero Trust Access JWT, strips client auth, injects real provider keys, and tags requests with anonymous per-user UUIDs. Reported scale: 20.18M requests/month, 241.37B tokens, 91% frontier labs / 9% Workers AI.
sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway — coding-agent + MCP specialisation. Databricks' Unity AI Gateway productises the same pattern for the specific category of developer coding tools (Cursor, Codex CLI, Gemini CLI, Claude Code) and their MCP integrations. Three-pillar framing named in the post (centralised audit + single bill + Lakehouse observability) — concepts/centralized-ai-governance. Extends the pattern along two axes: coding-tool clients as first-class citizens, and MCP-server governance as a peer concern to LLM-call governance. Pairs with new sibling patterns patterns/unified-billing-across-providers and patterns/telemetry-to-lakehouse.
sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple — Instacart internal AI Gateway as the tier below a batch-LLM platform. Unlike the Cloudflare / Databricks instances which productise the gateway itself, Instacart's AI Gateway is an internal Instacart service that sits between its own batch-LLM platform (Maple) and external LLM providers. Integrates with Cost Tracker for per-team attribution. The architectural point: an LLM-workload org often needs two layers of abstraction — AI Gateway for provider routing + cost tracking, and above it a workload-specific platform (Maple for batch, PIXEL for image gen, PARSE for extraction). Each concern lives at its natural layer. First wiki instance of the pattern-used-internally shape, vs the pattern-productised- externally shapes (Cloudflare / Databricks).

systems/cloudflare-ai-gateway — the Cloudflare instance of this pattern.
systems/unity-ai-gateway — Databricks instance.
systems/instacart-ai-gateway — Instacart internal instance.
systems/instacart-cost-tracker — the per-team accounting integration at Instacart's AI Gateway.
systems/maple-instacart — Instacart batch-LLM platform above the AI Gateway.
concepts/byok-bring-your-own-key — the secrets posture this pattern relies on.
concepts/cost-tracking-per-team — canonical governance integration at the gateway layer.
patterns/middleware-worker-adapter — the broader Worker-as- ownership-boundary pattern AI Gateway integrations usually pair with.
patterns/protocol-compatible-drop-in-proxy — the general protocol-preserving proxy pattern this specialises.
patterns/llm-batch-processing-service — batch-LLM sibling above the gateway. Maple at Instacart is the canonical instance.
companies/cloudflare — operator of the canonical instance.
companies/databricks — operator of Unity AI Gateway.
companies/instacart — operator of internal AI Gateway.
patterns/unified-image-generation-platform — image-generation sibling pattern — same architectural shape (single endpoint + unified parameter protocol + cheap no-redeploy model swap + VLM-based quality evaluation) applied to image generation instead of text LLM. Instacart PIXEL is the canonical instance.
systems/instacart-pixel — image-generation production instance of the provider-abstraction architecture.

AI Gateway provider abstraction¶

Mechanics¶

Why this pattern¶

Contrast¶

Seen in¶

Related¶