Skip to content

PATTERN Cited by 5 sources

Unified billing across providers

Unified billing across providers is the cost-management pattern of routing all LLM / AI traffic — first-party inference capacity, BYO external provider keys, multiple model families, multiple tool surfaces — through one gateway, so the organisation sees one bill and can enforce one budget model across the whole fleet.

Mechanics

  • The gateway (see patterns/central-proxy-choke-point) owns all upstream credentials, first-party and third-party alike.
  • First-party inference (Databricks Foundation Model API; Cloudflare Workers AI) is the default path — host-the-gateway vendor's own inference capacity.
  • External capacity (OpenAI, Anthropic, Gemini, custom endpoints) plugs in via BYO keys (concepts/byok-bring-your-own-key).
  • Gateway sees every request → per-identity, per-team, per-project cost attribution is computable, not estimated.
  • Budgets enforced at the gateway become portable across which tool a user picks — one budget per developer, not N budgets per (developer × tool).

The budget-portability flip

Databricks names this explicitly in the 2026-04-17 post: "With our centralized Gateway, admins can stop switching tabs between admin consoles to control rate limits and budgets for every single coding tool. Instead, organizations can give developers a single budget across all coding tools to burn down on their agent of choice!"

This is a structural shift: the budget primitive moves from (user, tool) → $ to user → $. The tool axis collapses. Works because:

  • The gateway is the only entity metering the traffic.
  • All tools go through it (enforced by patterns/central-proxy-choke-point).
  • All costs, regardless of upstream provider, convert to one currency via the gateway's pricing table.

Two ingested instances

  • Databricks Unity AI Gateway (sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway) — Foundation Model API provides first-party inference for OpenAI / Anthropic / Gemini / Qwen; external capacity BYO. "One all-in bill from Databricks." Budget per developer, not per tool.
  • Cloudflare internal AI engineering stack (sources/2026-04-20-cloudflare-internal-ai-engineering-stack) — Workers AI for cost-sensitive at-scale inference (9% of requests, 1.3M/month), frontier labs for complex work (91%, 47.95M/30 days). BYOK or Unified Billing via AI Gateway. Reported cost savings: one internal security agent on Kimi K2.5 at 77% cheaper than a mid-tier proprietary model (~$2.4M/yr saved).

Why first-party-plus-BYO

Pure-first-party: admin can only use the gateway vendor's inference, gateway lock-in maximal. Pure-BYO: admin carries N provider relationships, loses the single-bill benefit. The hybrid model — first-party as default, BYO as escape hatch — gives the organisation the "single bill" ergonomic while preserving the "use any frontier model" flexibility.

Costs / caveats

  • First-party inference must be price-competitive otherwise admins simply BYO everything and the single-bill promise weakens to "gateway aggregates invoices". Databricks specifically calls out "day one launches for every frontier LLM model" — first-party freshness is load-bearing for the pattern.
  • BYO-key accounting depends on the gateway proxying the upstream call. If admin's BYO traffic bypasses the gateway, the single-bill + budget invariants break.
  • Budget portability assumes fungibility across tools. If an org explicitly wants "Claude-Code gets $X, Cursor gets $Y", the gateway must still support the per-tool split. Per-developer is the new default, not the only option.

Relation to other patterns

Seen in

  • sources/2026-04-16-cloudflare-ai-platform-an-inference-layer-designed-for-agentscanonical per-request custom-metadata attribution instance. The 2026-04-16 post productises spend-by-attribute via a metadata: { teamId, userId, ... } field on every env.AI.run() call. "With AI Gateway, you'll get one centralized place to monitor and manage AI spend" — spend breaks down by "free vs. paid users, by individual customers, or by specific workflows in your app". Cites AIDB Intel's pulse survey: the average organisation calls 3.5 models across multiple providers; the gateway is the only entity with holistic visibility. 70+ models + 12+ providers all billable through one credit pool.
  • sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway — Foundation Model API + BYO external capacity; per-developer budget portability explicitly framed.
  • sources/2026-04-20-cloudflare-internal-ai-engineering-stack — Workers AI + BYOK / Unified Billing; $2.4M/yr savings quantified on one internal agent by model-substitution at the gateway layer.
  • sources/2026-04-16-cloudflare-deploy-postgres-and-mysql-databases-with-planetscale-workersstorage-tier instance. Cloudflare's 2026-04-16 PlanetScale-on-Workers post extends the pattern one tier below inference: customers provision PlanetScale Postgres / MySQL from the Cloudflare dashboard and (from "next month") are billed via their Cloudflare account, with Cloudflare credits (startup programme + committed spend) redeemable against PlanetScale usage. Full PlanetScale feature + SKU + pricing surface preserved — Cloudflare is a provisioning + billing aggregator, not a repackager. Paired with patterns/partner-managed-service-as-native-binding for the runtime-integration half of the integration. Confirms the pattern generalises across primitive tiers (inference → storage) on the Cloudflare developer platform.
  • sources/2024-02-15-flyio-globally-distributed-object-storage-with-tigrisobject-storage tier instance on a different platform. Fly.io's Tigris partnership rolls object-storage usage into the Fly.io bill alongside Supabase (databases) / PlanetScale (databases) / Upstash (Redis / Kafka): "to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we've wrapped everything under one bill. You don't have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month." This is the non-AI / non-Cloudflare instance of the pattern — shows the unified- billing shape is not specific to AI gateway infrastructure but generalises to any developer-platform partnership model spanning multiple primitive tiers (compute, block storage, databases, networking, object storage).
Last updated · 200 distilled / 1,178 read