Skip to content

CONCEPT Cited by 1 source

Capacity vs rate-limit quota

Two structurally different quota kinds, commonly conflated:

Capacity quota

Answers "how much of a pooled resource can this caller hold at once?". Enforced by a scheduler on resource admission:

  • Physical resources: memory, vcore, GPU, storage bytes.
  • Logical resources: max concurrent applications, max open database connections, max queued jobs.

Enforcement happens at scheduling time — a new task either fits inside the current capacity envelope or waits. The latency of a wrong decision is measured in job-scheduling intervals (seconds to minutes).

Canonical example: Pinterest's Moka platform stores per-project guaranteed + maximum memory + vcore + max-concurrent-apps; the Yunikorn Config Updater reads these from Piqama and rewrites Yunikorn queue configuration. "These values are not static but dynamically managed, encompassing: Guaranteed Resources … Maximum Resources … Max Concurrent Applications." (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)

Rate-limit quota

Answers "how many operations per second (or per minute) can this caller issue?". Enforced inline in the data path on every request:

  • Requests-per-second (RPS / QPS).
  • Bytes-per-second (bandwidth).
  • Tokens-per-minute (LLM APIs).

Enforcement happens per request — each individual request either consumes a token from the caller's bucket or is rejected. Decision latency adds directly to the request's critical path.

The rate-limit variant has an additional architectural question: where is the decision made? Local (in-process library, fleet-local counters) gives fast decisions + local-health awareness but only approximate global limits. Global (centralised rate-limit service) gives exact limits but adds RTT + dependency.

Why treat them together

Traditionally these are handled by different teams with different systems: the scheduler team owns capacity; the networking / API team owns rate-limits; app teams bolt on their own special cases. Pinterest's Piqama insight:

Both kinds reduce to a (subject → quota-value-set) control-plane contract plus domain-specific enforcement in the data plane.

Separate the lifecycle (quota definition + ownership + validation + rightsizing + chargeback — see quota lifecycle management) from the enforcement mechanism (scheduler vs inline library). A generic quota management platform serves both kinds with one control plane and pluggable per-domain enforcement.

This is a control-plane / data-plane story — see control-plane / data-plane separation.

Third class: application-specific quotas

Piqama also accommodates "application-specific quota units" that aren't capacity or rate-limit (e.g. seats, entitlement counts, number-of-saved-items caps). Same lifecycle, different enforcement.

Seen in

Last updated · 319 distilled / 1,201 read