Skip to content

CONCEPT Cited by 1 source

Cost tracking per team

Definition

Cost tracking per team is the platform-engineering discipline of attributing every inference call (and its dollar cost) back to the consuming internal team, enabling chargeback accounting, budget allocation, capacity planning, and runaway-cost detection. For LLM platforms specifically, it's typically implemented at the AI-Gateway layer — every LLM call passes through a single proxy that logs team + job + model + token count + cost.

Why per-team, not per-call

Per-call metrics are necessary but not sufficient. Per-team attribution is what lets a platform org answer questions like:

  • Which team is driving our 10× month-over-month cost growth?
  • Is Catalog's new extraction pipeline eating the quarterly LLM budget?
  • What's the cost-per-customer-transaction for each LLM-powered feature?
  • Are we giving each team a rate limit commensurate with their budget?

Without per-team tags, the answer is always "let me grep through logs for three days." With per-team tags, it's a dashboard.

Canonical wiki instance

Instacart's Cost Tracker integrates with the AI Gateway; every LLM call from any internal platform (including Maple's batch jobs) is logged with per- team attribution. Maple's own feature list names "tracks detailed cost usage for each team" as one of its outputs — the tracking surface is exposed upward from the Gateway-layer facility. (Source: sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple)

Sibling instances on the wiki:

  • systems/cloudflare-ai-gateway — external equivalent; same per-team / per-application attribution.
  • systems/unity-ai-gateway (Databricks) — three-pillar framing (audit + single-bill + Lakehouse observability) explicitly names cost tracking alongside audit as the governance pillars.

Place in the governance stack

Per-team cost tracking is one leg of concepts/centralized-ai-governance alongside audit, policy enforcement, and rate limiting. All four concerns tend to co-locate at the AI Gateway layer because they all require every LLM call to pass through one point — the gateway is the only place in a system topology where that's true by construction.

Generalisation

The pattern is reusable anywhere teams share a metered external resource:

  • Cloud provider costs — AWS Cost & Usage Reports with team tagging.
  • Database queries — per-application query accounting on a shared query gateway.
  • Third-party API calls — per-team attribution on a shared API proxy.

The operational preconditions:

  1. Single choke point — all calls pass through one tier.
  2. Team identity in requests — caller identity either authenticated or enforced by the gateway.
  3. Attribution storage — durable record with enough cardinality to query by team × time × model × job.

Seen in

Last updated · 319 distilled / 1,201 read