PINTEREST Tier 2

Pinterest — Piqama: Pinterest Quota Management Ecosystem¶

Junkai Xue, Zheyu Zha (Big Data Processing Platform), Jia Zhan, Alberto Ordonez Pereira (Online Systems) publish the 2026-02-24 Pinterest Engineering post introducing Piqama — Pinterest's generic quota management ecosystem. Piqama is a single platform designed to manage both capacity-based quotas (memory, vcore, concurrent applications for the Big Data Processing Platform) and rate-limit quotas (QPS, bandwidth for Online Storage systems like TiDB and Key-Value stores), plus application-specific quota units. The post lays out the architecture: a centralized control-plane (REST + Thrift management portal, lifecycle management, schema + validation + authorization + dispatch + enforcement), a governance/optimization layer (usage stats → Apache Iceberg on S3, auto-rightsizing service consuming historical data), and two worked integration examples — Moka (next-gen big-data processing on Yunikorn) for capacity quotas, and TiDB + KV stores for rate-limit quotas via a config-distribution (PinConf) + local-enforcement (SPF — Service-Protection Framework) shape.

Summary¶

The thesis: Pinterest found itself building separate quota systems per domain — capacity quotas for big-data compute, rate limits for storage services, ad-hoc app-specific quotas — and realised a single generic platform with pluggable schema / validation / dispatch / enforcement hooks could serve all of them. Piqama is that platform. Its architecture is a textbook control-plane / data-plane separation: the control plane owns the quota lifecycle (CRUD, authorization, validation, schema); the data plane is where application-specific enforcement happens (via integrated Piqama clients or via customer-supplied logic). A feedback loop closes over the data plane — applications emit enforcement + usage statistics that land in Apache Iceberg on S3 (with pre-aggregation for storage efficiency), and an auto-rightsizing service reads that history (via Presto or Iceberg directly) to recompute future quota values against organic-growth, burst, and underutilization strategies.

The capacity-quota use case (Moka): Piqama stores per-project guaranteed resources (memory + vcore floor), maximum resources (memory + vcore ceiling), and max concurrent applications. A Yunikorn Config Updater polls Piqama for updated values and rewrites Yunikorn scheduling-framework configs accordingly. Each Yunikorn queue backs a project; completed applications emit usage into an S3 file, which flows into a resource database feeding both (a) future quota calculations and (b) budget-enforcement — when a project's usage exceeds its allocated budget inside a time window, Piqama dynamically lowers that project's maximum resources to throttle its "burning speed" while preserving other projects' compliant access. Manual adjustment is retained as an escape hatch for firefighting.

The rate-limit use case (Online Storage, TiDB + KV Stores): the existing rate limiting had three named limitations — non-declarative rules, manual/error-prone adjustment, static/non-adaptive thresholds. The new design makes Piqama the control plane while keeping rate-limit decisions local in the data path (via an in-house library = the Service-Protection Framework / SPF — throttling + concurrency control + rate limiting). Rule delivery to hosts uses PinConf, Pinterest's existing config-management platform (same substrate as feature flags). Rule adjustment is split: ad-hoc edits via UI/API + continuous adjustment via the Piqama rightsizing service aggregating request stats. The local-decision choice is deliberate: "fast rate limiting decisions (in contrast to relying on a global rate limiting service)" + flexibility to factor local service-health into rejection decisions (graceful rejection).

The post also introduces Quota vs Budget: budgets allocate dollars; chargeback translates usage to dollars. Exceeding budget penalises resource allocation (the X% haircut described in Moka's case). Piqama integrates with Pinterest's Entitlement system (future work). Future roadmap: entitlement integration, advanced auto-rightsizing, distributed quota management across instances (cross-instance aggregation — not detailed in this post), and a unified client.

Key takeaways¶

Piqama is a generic quota platform — not a storage-quota system or a rate-limit system. "Piqama is Pinterest's Quota Management Ecosystem, created to oversee quotas across diverse systems and quota types, while accommodating multiple platforms and scenarios." The platform's key design property is that schema management, validation, dispatching, and enforcement are all pluggable per application: "Piqama emphasizes customization, enabling different application systems to integrate their specific logic for schema management, validation, dispatching, and enforcement." A single piece of control-plane infrastructure (portal, REST + Thrift API, lifecycle, chargeback, rightsizing) serves both capacity-based and rate-limit-based quotas. Canonical wiki example of a generic quota management platform. (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Capacity quotas and rate-limit quotas are structurally different but fit under one control plane. Capacity quotas answer "how much of a pooled resource can this project hold at once?" and are enforced by the scheduler (Yunikorn queues, memory+vcore floors/ceilings, concurrent-apps caps). Rate-limit quotas answer "how many requests per second can this caller issue?" and are enforced inline in the data path at request time. Piqama's insight: both reduce to a (subject → quota-value-set) control-plane contract + domain-specific enforcement. Applications pick which enforcement model they want (or bring their own); the control plane is the same. (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Centralized control plane + local enforcement is the load-bearing architectural choice for the rate-limit variant. "Rate limit decisions should be made locally in the data path for scalability and performance reasons, with quota management happening in an async fashion. … This enables fast rate limiting decisions (in contrast to relying on a global rate limiting service), and also the flexibility to make local decisions based on service health information (e.g. to support graceful rejection based on service capacity)." The alternative — a synchronous lookup against a global rate-limit service on every request — would put an RTT in the hot path and couple every request's success to the central service's availability. Piqama's async-centralized-quota-local-enforcement shape pushes rules out via PinConf broadcast and keeps the request-time decision on the local host. Canonical wiki instance distinct from global-rate-limit-service designs. (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Config delivery rides on PinConf — the same substrate as feature flags and dynamic service config. "We leverage Pinterest's config management platform (Pinconf) to deliver rate limiting rules on the subscribing hosts. This allows us to scale with Pinterest's config delivery infrastructure, similar to how we manage feature flags and other types of dynamic service configurations." Canonical wiki instance of quota-rules-as-dynamic-configuration — quota rules are not a novel distribution problem; they're isomorphic to feature flags + dynamic config and should ride that infra instead of inventing a new one. Reduces operational surface (one pub/sub tier to monitor) and inherits the config system's existing guarantees (rollout, rollback, auditing). (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Auto-rightsizing closes the loop — usage statistics feed back into future quota values. "Piqama's framework allows a separate auto-rightsizing service to continuously consume historical data from various sources, including Presto, Iceberg, and user-defined data sources. This service applies rightsizing strategies designed to predict needs based on organic usage growth, traffic bursts, and underutilization detection." The data path: Piqama clients transparently collect enforcement + usage stats → Iceberg on S3 (pre-aggregated) → separate auto-rightsizing service reads from Iceberg / Presto / user-defined sources → writes updated quota values back via the Piqama API. Canonical wiki instance of historical-usage auto-rightsizing as a decoupled service consuming the platform's own telemetry. (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Budget integration: exceeding budget automatically lowers maximum resources inside a time window. "When a project's resource usage exceeds its allocated budget within a defined time window, Piqama triggers an enforcement mechanism. The maximum resources available to that project are dynamically lowered. This proactive measure effectively controls the 'burning speed' of resources for the over-budget entity, ensuring that available resources are prioritized and allocated to projects that are operating within their defined budgets." Concretely for the Big Data Processing Platform, over-budget projects "may see a reduction of X% in their resources, depending on their tier." This is the canonical wiki instance of budget-enforced quota throttling — the feedback from the financial budget system into the scheduler-enforced quota, with a deliberate tier-weighted haircut rather than a binary cutoff. Teams can respond by securing more budget or re-prioritising workloads. (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Declarative rules are the named prerequisite for richer rate-limiting. "The existing rules are not declarative, hindering support for diverse and complex use cases, such as sophisticated queries or specific request properties." Pinterest's prior rate-limit framework used imperative, ad-hoc thresholds that couldn't express "limit user X's scan-heavy queries differently from user X's lookup queries." The redesign makes rules declarative so that Piqama can validate them, the rightsizing service can reason about them, and the control-plane UI can render them. Canonical wiki statement of the declarative-rule shift as a rate-limit prerequisite. (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Pluggable validation includes a remote-service escape hatch for advanced invariants. "The platform provides a pluggable validation framework. Users can define custom validation rules for both schema and semantic levels, and even integrate with remote services for advanced validation (e.g., ensuring the sum of all quotas does not exceed cluster resource capacity)." The canonical advanced-validation example is the cluster-capacity sum check — the sum of all project quotas on a cluster cannot exceed the cluster's physical resources. That check requires a live lookup against cluster state outside Piqama's database, which is why remote-service validation is a first-class hook. (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Manual adjustment is retained as a firefighting escape hatch — not removed by auto-rightsizing. "Recognizing the need for immediate responsiveness, Piqama provides a mechanism for development teams to manually adjust quota values. This flexibility is particularly vital in critical situations such as 'firefighting' emergencies or for accommodating urgent, high-priority requests that necessitate immediate resource rebalancing." Canonical operational pattern: even fully-automated rightsizing must expose a manual-override path, because operator response time on a hot incident is shorter than any feedback loop. The manual path coexists with automation rather than replacing it. (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Storage: Apache Iceberg on S3 with pre-aggregation. "Once applications provide data in the correct format, statistics are stored in Apache Iceberg on Amazon S3. These stored statistics are also pre-aggregated to optimize storage space." Piqama's governance substrate is a standard Iceberg-on-S3 lakehouse — not a bespoke time-series store — and queryable via Presto by the rightsizing service. Pre-aggregation at write time keeps the cumulative storage cost bounded despite fleet-wide telemetry volume. Canonical wiki instance of Iceberg-as-telemetry-and-billing-substrate complementing the existing Iceberg-as-table-format instances. (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Integration footprint named: Moka (Big Data Processing Platform, fully managed; quota lifecycle entirely in Piqama now), TiDB, and Key-Value Stores (initial integration done). Future named adopters: PinCompute (Pinterest's Kubernetes-backed general-purpose compute platform), ML Training Platform, and LLM Serving Services. That LLM Serving callout matters: LLM serving has a distinctive mix of capacity (GPU minutes, concurrent context slots) and rate-limit (tokens/sec, requests/sec) quotas, which is exactly the use case Piqama's dual-model generic platform is positioned for. (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)

Architectural numbers¶

Datum	Value	Scope
Platform name	Piqama	Pinterest Quota Management Ecosystem
Control-plane API	REST + Thrift	Management portal + client SDK
Quota storage	Own store (unnamed in post)	Control plane
Usage-stats storage	Apache Iceberg on Amazon S3	Governance / rightsizing feedback loop
Query engine for rightsizing	Presto + Iceberg + user-defined sources	Auto-rightsizing service
Config distribution (rate-limit)	PinConf	Same substrate as feature flags
Big-Data scheduler integration	Apache Yunikorn via Yunikorn Config Updater	Moka
Online-storage integrations (initial)	TiDB, Key-Value Stores	Rate-limit variant
Budget-overrun penalty	X% resource reduction, tier-dependent	Moka (Big Data Processing Platform)
Rate-limit enforcement locus	Local (in-process library = SPF)	Not a global service
Future named integrations	PinCompute, ML Training, LLM Serving	Roadmap
Post publication date	2026-02-24	Pinterest Engineering on Medium

Systems introduced¶

systems/pinterest-piqama — the Piqama platform itself. Generic quota management ecosystem; REST + Thrift management portal; pluggable schema / validation / dispatch / enforcement hooks; feedback loop via Apache Iceberg on S3 and a separate auto-rightsizing service.
systems/pinterest-moka — Pinterest's next-generation massive-scale Big Data Processing platform (announced 2024 as "Moka", the canonical integration target for Piqama's capacity-quota variant). Built atop Apache Yunikorn; each project maps to a Yunikorn queue; Piqama owns the queue config; per-application usage flows back to the resource database and ultimately Iceberg.
systems/apache-yunikorn — Apache open-source resource scheduling framework, used by Moka for memory + vcore + concurrent-application management on Kubernetes (per the Moka architecture). Yunikorn Config Updater is the bridge that reads Piqama quota values and writes Yunikorn queue configs.
systems/pinterest-pinconf — Pinterest's config management platform. Canonical substrate for dynamic configuration — feature flags, service config, and now rate-limit rules. Piqama rides on PinConf for rule delivery to subscribing hosts in the rate-limit variant.
systems/pinterest-spf — Pinterest's in-house Service-Protection Framework. In-process library integrated into application services; provides "general throttling and concurrency control" alongside rate limiting. Enables local rate-limit decisions (no global-service round-trip on hot path). Details deferred to a future post.

Systems reused / extended¶

systems/tidb — Pinterest's TiDB deployment is a named initial integration target for Piqama's rate-limit variant. Complements the 2024-05-14 HBase-deprecation post which established TiDB as Pinterest's chosen post-HBase NewSQL substrate.
systems/apache-iceberg — used as the pre-aggregated statistics store for Piqama's governance + rightsizing feedback loop. Extended with the telemetry-and-billing-substrate role.
systems/presto — used by the auto-rightsizing service to query historical usage data. Extended with Pinterest-Piqama-rightsizing as a named consumer.
systems/aws-s3 — object-storage substrate under the Iceberg governance store.

Concepts introduced¶

concepts/quota-lifecycle-management — end-to-end management of a quota: schema definition, validation, authorization, dispatch, enforcement, usage feedback. Canonical wiki page distinguishing quota lifecycle from mere quota enforcement.
concepts/capacity-vs-rate-limit-quota — two structurally distinct quota kinds unified under Piqama's generic control plane. Capacity quotas are enforced by schedulers; rate-limit quotas inline in the data path.
concepts/quota-auto-rightsizing — periodic adjustment of quota values based on historical usage + strategies for organic growth / bursts / underutilization. First-class wiki concept distinct from generic autoscaling.
concepts/declarative-quota-rule — quota rules expressed as structured declarative definitions (as opposed to imperative thresholds), enabling validation, tooling, UI rendering, and automated rightsizing.
concepts/local-rate-limit-decision — architectural choice to make rate-limit decisions inline on the requesting host rather than via a synchronous lookup against a global service; enables fast decisions + local-health-aware rejection.
concepts/entitlement-budget-quota-integration — budgets → entitlements → quotas as a chain: dollars allocate budget; entitlements translate budget into resource rights; quotas enforce those rights at scheduler / data-plane layer.
concepts/pluggable-validation-framework — customer-supplied validation at schema + semantic levels, including remote-service hooks for advanced live-state validation.

Concepts reused / extended¶

concepts/control-plane-data-plane-separation — extended with the quota-management instance: Piqama is control plane, applications (Moka + Yunikorn / TiDB + SPF / KV stores + SPF) are data plane. Pluggability of data-plane enforcement is the key degree of freedom.

Patterns introduced¶

patterns/generic-quota-management-platform — one control plane with pluggable schema / validation / dispatch / enforcement serving multiple quota kinds (capacity + rate-limit + app-specific). Canonical wiki instance: Piqama.
patterns/async-centralized-quota-local-enforcement — central control plane owns rule lifecycle; rules are pushed async to hosts (via a config-distribution substrate); request-time enforcement is local. Canonical wiki instance: Piqama + PinConf + SPF for Pinterest online-storage rate limits.
patterns/historical-usage-auto-rightsizing — feedback loop where data-plane usage telemetry lands in a queryable store, a separate rightsizing service consumes it on a cadence, and writes recomputed quota values back through the control plane API. Canonical wiki instance: Piqama + Iceberg + Presto + auto-rightsizing service.
patterns/config-distribution-for-quota-rules — treat quota rules as dynamic configuration and ride an existing config-distribution substrate (feature flags / dynamic config) rather than inventing a new pub-sub. Canonical wiki instance: Piqama + PinConf.
patterns/budget-enforced-quota-throttle — on budget exceedance, dynamically lower a project's maximum-resource quota inside the exceedance window to control "burning speed"; tier-weighted haircut rather than binary cutoff; preserves compliant-project access. Canonical wiki instance: Piqama + Moka.

Patterns reused / extended¶

patterns/chargeback-cost-attribution — extended with the Pinterest Big-Data-Processing-Platform instance: usage stats → Iceberg on S3 → chargeback → budget exceedance → X% tier-weighted quota haircut. Canonical closed-loop chargeback with automated scheduler-enforced throttling; complements the existing Mercedes-Benz data-mesh and Netflix cloud-efficiency instances on the attribution axis with the throttling-on-exceedance axis.

Caveats¶

Architecture / design post — no scale numbers disclosed. The post is a platform-announcement piece. Concrete numbers are absent: no number of quota rules stored, no enforcement-decisions-per-second, no rule-propagation latency, no number of integrated services, no Iceberg data-volume or rightsizing-job cadence figures. The X% budget-overrun haircut is named but the tier rules are not enumerated.
"Distributed Quota Management" is future work, not described. The roadmap bullet "Introducing advanced features for managing quotas across distributed instances to better support complex environments" suggests cross-instance quota aggregation (likely a hard problem: distributed counting, eventual consistency, quota over-commit) — but no architecture is disclosed in this post. Treat as a pending-post signal.
SPF (Service-Protection Framework) details are deferred. "We'll defer the details of SPF in a future blog post." SPF is load-bearing for the rate-limit variant but opaque here — rate-limiting algorithm (token bucket / sliding window / concurrency-based?), how local service-health is factored, how rule updates land are not described.
No cross-instance coordination semantics for rate limits in this iteration. The local-decision design explicitly trades global-rate-limit guarantees for speed; the exact rate a cluster will collectively enforce is ≈ sum-over-hosts(host-local-limit) which depends on fleet size. How Piqama accounts for this fleet-size dependence when setting per-host limits is not described.
Dispatch flexibility understated in the concrete walkthroughs. The post says Piqama allows "Piqama clients facilitate receiving the latest quota updates, [or] the system is flexible, allowing users to utilize other dispatching mechanisms like Pinterest's config distribution system (PinConf) or their own custom dispatchers" — but the Moka case uses polling ("Yunikorn Config Updater regularly checks Piqama"), the storage case uses PinConf broadcast. No custom-dispatcher example is given.
Validation framework's remote-service semantics. The remote-service validation hook for cluster-capacity sum-checks is named but not detailed — synchronous? Cached? Authoritative vs advisory? Failure mode when the remote is down?
Quota vs Budget vs Entitlement triangulation is partially defined. The post sketches all three relationships but only partially defines the integration direction. The Entitlement integration is explicitly future work; the budget → quota direction is walked through for Moka; the reverse (quota usage → chargeback dollars → budget decrement) is implicit but not fully described.
Auto-rightsizing strategies named but not specified. "Strategies designed to predict needs based on organic usage growth, traffic bursts, and underutilization detection" — the underlying forecasting model, cadence, and override behaviour when the strategy recommends a cut to a critical project are not described. Only "capacity-based quotas for a Big Data Processing Platform within an organization" has a rightsizing strategy deployed as of writing.
Validation of the rightsizing service itself. How does Pinterest validate that an auto-generated quota value is safe before rolling it out? Is there a staging window? A-B? Operator approval for large changes? Not disclosed.
Governance-store implementation choices partially disclosed. Iceberg on S3 is named; the Iceberg catalog implementation (Hive Metastore? Pinterest's own?) and the Presto connector specifics are not.