CONCEPT Cited by 1 source
Capacity vs rate-limit quota¶
Two structurally different quota kinds, commonly conflated:
Capacity quota¶
Answers "how much of a pooled resource can this caller hold at once?". Enforced by a scheduler on resource admission:
- Physical resources: memory, vcore, GPU, storage bytes.
- Logical resources: max concurrent applications, max open database connections, max queued jobs.
Enforcement happens at scheduling time — a new task either fits inside the current capacity envelope or waits. The latency of a wrong decision is measured in job-scheduling intervals (seconds to minutes).
Canonical example: Pinterest's Moka platform stores per-project guaranteed + maximum memory + vcore + max-concurrent-apps; the Yunikorn Config Updater reads these from Piqama and rewrites Yunikorn queue configuration. "These values are not static but dynamically managed, encompassing: Guaranteed Resources … Maximum Resources … Max Concurrent Applications." (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Rate-limit quota¶
Answers "how many operations per second (or per minute) can this caller issue?". Enforced inline in the data path on every request:
- Requests-per-second (RPS / QPS).
- Bytes-per-second (bandwidth).
- Tokens-per-minute (LLM APIs).
Enforcement happens per request — each individual request either consumes a token from the caller's bucket or is rejected. Decision latency adds directly to the request's critical path.
The rate-limit variant has an additional architectural question: where is the decision made? Local (in-process library, fleet-local counters) gives fast decisions + local-health awareness but only approximate global limits. Global (centralised rate-limit service) gives exact limits but adds RTT + dependency.
Why treat them together¶
Traditionally these are handled by different teams with different systems: the scheduler team owns capacity; the networking / API team owns rate-limits; app teams bolt on their own special cases. Pinterest's Piqama insight:
Both kinds reduce to a
(subject → quota-value-set)control-plane contract plus domain-specific enforcement in the data plane.
Separate the lifecycle (quota definition + ownership + validation + rightsizing + chargeback — see quota lifecycle management) from the enforcement mechanism (scheduler vs inline library). A generic quota management platform serves both kinds with one control plane and pluggable per-domain enforcement.
This is a control-plane / data-plane story — see control-plane / data-plane separation.
Third class: application-specific quotas¶
Piqama also accommodates "application-specific quota units" that aren't capacity or rate-limit (e.g. seats, entitlement counts, number-of-saved-items caps). Same lifecycle, different enforcement.
Seen in¶
- sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem — canonical wiki introduction. Piqama's thesis is that both kinds (plus app-specific) fit under one platform if the enforcement surface is pluggable.