PATTERN Cited by 1 source
Tenant-per-application mapping¶
Problem¶
In a large multi-tenant internal platform (observability, storage, compute), each tenant needs to map to some unit of ownership so that:
- Write/read guardrails can be attributed and enforced per tenant.
- Resource consumption can be tracked for capacity planning and eventual chargeback.
- Growth in usage can be traced back to the specific software that caused it.
What should the tenant actually be? Teams? Products? Applications? Services? The answer isn't obvious, and picking wrong creates long-term pain.
Options considered¶
Tenant-per-team (rejected). Team ownership of applications changes frequently — reorgs, product-area migrations, team splits, and hiring shifts mean the mapping is unstable. Every reorg becomes a platform-config migration.
Tenant-per-product (problematic). Products span many services; the signal you get ("product X is bursty") isn't actionable at the infrastructure layer.
Tenant-per-application / per-service (chosen). Applications / services are a more logical and stable grouping:
- Services have long-lived, well-defined owners (even if which team owns them changes).
- Metric growth attributes precisely to specific applications.
- Per-tenant guardrails (ingest rate, series limit, rule count, evaluation interval) map cleanly to per-service limits.
- Foundations are in place for future chargeback — costs can be billed to the service, not to the owning team's headcount.
Airbnb runs this model with ~1,000 services, each a separate tenant, in their in-house metrics storage system. (Source: sources/2026-04-21-airbnb-building-a-fault-tolerant-metrics-storage-system)
Guardrails to expose vs. derive¶
A related sub-decision: when a tenant approaches or exceeds a limit, which knobs do operators expose, and which are derived?
Airbnb's choice — expose series limits, derive everything else (ingestion rate, ingestion burst size). Fewer knobs means:
- Less operator confusion when a tenant hits a limit ("which limit should I bump?").
- Fewer combinations of parameters to support.
- Clear downstream invariant: if a tenant needs more series, they get more ingest capacity proportionally.
Onboarding automation¶
Tenant-per-application only works if onboarding a new tenant is cheap. Airbnb built a consolidated control plane that:
- Monitors new service creation → auto-enrolls the service as a tenant.
- Picks up tenant-config changes via a single deployment (no per-component code-change-and-deploy chains).
Before this control plane, tenant onboarding was "numerous manual steps across multiple components, a series of code changes and deployments, often consuming a lot of time" — which in practice means fewer tenants get onboarded, and the multi-tenant model degrades into effective single-tenancy.
When this pattern fits¶
- Internal platform serving dozens-to-thousands of internal services.
- Stable service identifiers (service name, repo name, or similar).
- Cost-attribution / chargeback is a goal — or will be.
- Guardrails are meaningful at the service level (series / QPS / storage per service is a useful bound).
When it doesn't fit¶
- Customer-facing SaaS — tenant is the customer, not an internal service.
- Very few services / one monolith — nothing to attribute to.
- Shared infrastructure services where many teams write to the same logical component (the "one big service everyone dumps metrics into" anti-pattern); here the service isn't a meaningful unit of attribution.
Seen in¶
- sources/2026-04-21-airbnb-building-a-fault-tolerant-metrics-storage-system — Airbnb's observability storage system uses tenant-per-service across ~1,000 services; exposes series limits, derives ingest rate; auto-onboards new tenants via a consolidated control plane.