Skip to content

PATTERN Cited by 3 sources

Central proxy choke point

Central proxy choke point is the organisational-scale posture of forcing all AI / LLM / agent traffic in an enterprise through one proxy before it reaches any provider, any MCP server, or any inference endpoint. The choke point is where identity, auth, key injection, audit logging, rate limiting, budget enforcement, fallback routing, telemetry collection, and policy enforcement all live — not in the clients.

Why "choke point" instead of just "proxy"

The discipline is architectural: there is no second path to AI capability. Clients have no direct provider credentials. Developers can't bypass the gateway by pasting an API key on a laptop — their laptop doesn't have the API key. The gateway owns every LLM call by virtue of being the only entity in the network topology that possesses the upstream credentials (concepts/byok-bring-your-own-key).

Mechanics

  • Single ingress. All coding tools / agents / applications dial one URL (the gateway).
  • Client auth via a different substrate than the upstream provider — SSO, Zero Trust JWT, workload identity, enterprise credential — so provider keys never leave the gateway.
  • Server-side key injection. Gateway re-authenticates to upstream providers with its own stored keys. Client's identity is translated into gateway-internal metadata (per-user UUID, per-tenant quota cell, cost-centre tag).
  • All policy lives here. Rate limits, budgets, audit records, fallback chains, model routing, guardrails, telemetry — single surface.

Two ingested production instances

  • Cloudflare internal stack (sources/2026-04-20-cloudflare-internal-ai-engineering-stack) — every LLM request from Cloudflare's internal tooling flows through a single Hono Worker in front of AI Gateway. Worker validates the Cloudflare Access JWT, strips client auth, injects real provider keys (cf-aig-authorization), tags requests with an anonymous per-user UUID (cf-aig-metadata) resolved from D1 + KV. Reported scale: 20.18M requests/month, 241.37B tokens, 91% frontier labs / 9% Workers AI across 3,683 users.
  • Databricks Unity AI Gateway (sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway) — all coding-agent + MCP traffic from Cursor / Codex / Claude Code / Gemini CLI funnels through Unity AI Gateway. Single SSO ("one Databricks credential for all tools, GitHub, Atlassian, others"), single audit plane (Unity Catalog), single bill (Foundation Model API + BYO external capacity), single telemetry store (OpenTelemetry → Delta tables).

Both instances describe the same architectural stance. The Databricks post frames it explicitly as an admin-productivity win: "stop switching tabs between admin consoles to control rate limits and budgets for every single coding tool."

Why the pattern scales

  • Linear review cost. Adopting a new tool is a connect-to- gateway change, not an N-vendor security review.
  • Model-swap at config-change speed. New frontier model drops weekly — gateway config change, no redeploy of any client.
  • Per-identity quotas become portable. Budgets follow the developer, not the tool (patterns/unified-billing-across-providers).
  • Telemetry cardinality is bounded. One schema, one store, one dashboard — joinable with business data (patterns/telemetry-to-lakehouse).

Costs / risks

  • Single point of failure. Gateway-down = AI-down for the whole org. Mitigation: fallback chains inside the gateway, HA deployment, regional failover.
  • Gateway-cap-ex concern. Gateway itself becomes a hot-path component with org-scale throughput requirements.
  • Client-side escape hatches. Any client that can route around the gateway (e.g. personal-key in a dev-laptop env file) breaks the choke-point invariant. The pattern only works when clients cannot reach upstream providers directly.
  • Governance-centralisation downsides. One policy surface means one team owns LLM policy. Teams wanting experimentation friction usually want less gateway, not more.

Relation to other patterns

Seen in

Last updated · 200 distilled / 1,178 read