PATTERN Cited by 7 sources
Central proxy choke point¶
Central proxy choke point is the organisational-scale posture of forcing all AI / LLM / agent traffic in an enterprise through one proxy before it reaches any provider, any MCP server, or any inference endpoint. The choke point is where identity, auth, key injection, audit logging, rate limiting, budget enforcement, fallback routing, telemetry collection, and policy enforcement all live — not in the clients.
Why "choke point" instead of just "proxy"¶
The discipline is architectural: there is no second path to AI capability. Clients have no direct provider credentials. Developers can't bypass the gateway by pasting an API key on a laptop — their laptop doesn't have the API key. The gateway owns every LLM call by virtue of being the only entity in the network topology that possesses the upstream credentials (concepts/byok-bring-your-own-key).
Mechanics¶
- Single ingress. All coding tools / agents / applications dial one URL (the gateway).
- Client auth via a different substrate than the upstream provider — SSO, Zero Trust JWT, workload identity, enterprise credential — so provider keys never leave the gateway.
- Server-side key injection. Gateway re-authenticates to upstream providers with its own stored keys. Client's identity is translated into gateway-internal metadata (per-user UUID, per-tenant quota cell, cost-centre tag).
- All policy lives here. Rate limits, budgets, audit records, fallback chains, model routing, guardrails, telemetry — single surface.
Two ingested production instances¶
- Cloudflare internal stack
(sources/2026-04-20-cloudflare-internal-ai-engineering-stack)
— every LLM request from Cloudflare's internal tooling flows
through a single Hono Worker in front of AI Gateway.
Worker validates the Cloudflare Access JWT, strips client
auth, injects real provider keys (
cf-aig-authorization), tags requests with an anonymous per-user UUID (cf-aig-metadata) resolved from D1 + KV. Reported scale: 20.18M requests/month, 241.37B tokens, 91% frontier labs / 9% Workers AI across 3,683 users. - Databricks Unity AI Gateway (sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway) — all coding-agent + MCP traffic from Cursor / Codex / Claude Code / Gemini CLI funnels through Unity AI Gateway. Single SSO ("one Databricks credential for all tools, GitHub, Atlassian, others"), single audit plane (Unity Catalog), single bill (Foundation Model API + BYO external capacity), single telemetry store (OpenTelemetry → Delta tables).
Both instances describe the same architectural stance. The Databricks post frames it explicitly as an admin-productivity win: "stop switching tabs between admin consoles to control rate limits and budgets for every single coding tool."
Why the pattern scales¶
- Linear review cost. Adopting a new tool is a connect-to- gateway change, not an N-vendor security review.
- Model-swap at config-change speed. New frontier model drops weekly — gateway config change, no redeploy of any client.
- Per-identity quotas become portable. Budgets follow the developer, not the tool (patterns/unified-billing-across-providers).
- Telemetry cardinality is bounded. One schema, one store, one dashboard — joinable with business data (patterns/telemetry-to-lakehouse).
Costs / risks¶
- Single point of failure. Gateway-down = AI-down for the whole org. Mitigation: fallback chains inside the gateway, HA deployment, regional failover.
- Gateway-cap-ex concern. Gateway itself becomes a hot-path component with org-scale throughput requirements.
- Client-side escape hatches. Any client that can route around the gateway (e.g. personal-key in a dev-laptop env file) breaks the choke-point invariant. The pattern only works when clients cannot reach upstream providers directly.
- Governance-centralisation downsides. One policy surface means one team owns LLM policy. Teams wanting experimentation friction usually want less gateway, not more.
Relation to other patterns¶
- Specialisation: patterns/ai-gateway-provider-abstraction is the same pattern viewed from the provider axis (proxy abstracts over many upstream providers); central proxy choke point is the same pattern viewed from the client axis (proxy abstracts over many downstream clients / tools / applications).
- Pairs with concepts/byok-bring-your-own-key (key posture) and concepts/centralized-ai-governance (the three-pillar organisational framing).
Seen in¶
- sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog — Generalises the choke-point shape from coding-agent scope (2026-04-17) to org-wide agent populations. The 2026-05-20 Databricks four-pillars post extends Unity AI Gateway from "every coding tool" to "every model call, every tool invocation, every agent interaction" across dev / analytics / sales-ops / support / marketing / finance. The choke-point posture acquires four named layers attached to the same proxy: Service Policies (per-tool-call admission via UC functions), Guardrails (inline content scanning, fail-closed), Inference Tables (full-payload audit writes to lakehouse), Budgets (per-user / per-group thresholds with alerts). Re-frames the choke-point property as the enabling discipline for governance-travels-with-resources — agents that bypass the gateway escape every layer of policy. Also canonicalises patterns/three-layer-agent-control (permissions / Service Policies / Guardrails) as the load-bearing composition the choke-point hosts.
- sources/2026-04-16-cloudflare-ai-platform-an-inference-layer-designed-for-agents
— catalog-scale choke point. The 2026-04-16 AI Platform
post productises one
env.AI.run()binding against 70+ models across 12+ providers; the gateway is the only vantage point that sees aggregate spend, per-tenant attribution, and cross-provider failover. Extends the pattern with buffered resumable streaming (patterns/buffered-resumable-inference-stream) — the choke-point now also owns stream lifetime, not just request routing. - sources/2026-04-20-cloudflare-internal-ai-engineering-stack — Hono-Worker-in-front-of-AI-Gateway; scale numbers disclosed.
- sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway — Unity AI Gateway specialised for coding-agent + MCP governance.
- sources/2026-04-20-cloudflare-orchestrating-ai-code-review-at-scale
— CI-embedded choke point for AI code review. Every LLM
call from Cloudflare's AI
Code Review system flows through
AI Gateway — the coordinator, all seven sub-reviewers, and
re-review retries — all under one set of keys, one unified
spend view, one failback topology. The
reviewer-config KV
Worker is the choke-point's control-plane surface: flip a
provider's
enabledflag and every running CI job re-routes within 5 seconds without a code deploy. Adds the CI- integration variant to the pattern — choke-point + per-role remote routing + per-tier circuit breakers. - sources/2026-03-19-pinterest-building-an-mcp-ecosystem-at-pinterest
— Enterprise-internal MCP choke point. Pinterest's
MCP Registry +
Envoy mesh validation together form the
choke-point for MCP traffic inside Pinterest. "Envoy validates
the JWT, maps it to
X-Forwarded-User,X-Forwarded-Groups, and related headers, and enforces coarse-grained security policies." Differs from the Cloudflare/Databricks instances on two axes: (a) internal-employee traffic not external-LLM-API traffic — the choke-point's upstream-credential job is absent (Pinterest isn't injecting API keys); (b) the choke-point's policy surface is Envoy config driven by registry-backed security review outcomes, not a dashboard flag. Shares with the other instances: "there is no second path to AI capability" — MCP tools are only reachable through the Envoy + registry substrate. Canonical wiki instance of the enterprise-SSO-piggyback choke-point shape (patterns/layered-jwt-plus-mesh-auth). - sources/2026-04-14-redpanda-openclaw-is-not-for-enterprise-scale — Agent-workforce-scale choke point framing with kill switch as a first-class primitive. Redpanda's 2026-04-14 Openclaw is not for enterprise scale post canonicalises the choke-point pattern as component #1 of the four- component agent production stack (Gateway + Audit log + Token vault + Sandboxed compute). Verbatim: "The gateway boils down to having a single choke point for all agentic access to external systems and information. It allows you to have full observability into what your agents are doing [...] A gateway is also a centralized place for enforcing rate limits and applying guardrails." The post adds a kill-switch capability to the canonical choke-point capability set — "the kill switch goes to turn off a rogue agent leaking your Salesforce customer data. No need to hunt down API access for 27 different services and systems, just turn it off for a single service or set of services for your entire digital workforce at once." Distinct from the Cloudflare / Databricks / Pinterest instances on two axes: (a) agent traffic as the primary abstraction (not LLM-inference traffic or MCP traffic specifically, but the agent's full egress surface); (b) composed with the concepts/token-vault as a separate architectural component — the gateway doesn't hold credentials, it asks the vault per-call. This is the rhetorical-voice essay altitude (no scale numbers, no mechanism depth), but the "Gateway + Audit trail + Token vault + Sandboxed compute = Agents in production" formula canonicalises the choke-point as the keystone of a four-component production-minimum.