PATTERN Cited by 1 source
Async-centralized quota + local enforcement¶
Pattern: centralise the quota lifecycle in a control-plane service (CRUD, authorization, validation, audit, rightsizing) — but keep the request-time enforcement decision local to the caller's host, fed by rule updates that arrive asynchronously via an existing configuration-distribution substrate.
Shape¶
┌─────────────────────────┐
│ Control plane (CRUD) │
│ rules authored here │
└──────────┬──────────────┘
│
(async publish)
│
▼
┌─────────────────────────┐
│ Config-distribution │
│ (feature-flag / conf │
│ push substrate) │
└──────────┬──────────────┘
│
(push / pull)
│
▼
┌──────────────────────────────────────┐
│ Service host │
│ ┌──────────────────────────────┐ │
│ │ Local rule cache │ │
│ │ Local counters │ │
│ │ In-process enforcement lib │ │
│ └──────────────────────────────┘ │
│ ▲ │
│ │ request │
│ ─────┴──────▶ serve or reject │
└──────────────────────────────────────┘
Rule updates are eventually consistent — new rules take effect after PinConf / equivalent pushes them out (seconds). Enforcement decisions are local + immediate — no request-time RTT to the control plane.
Three architectural contracts¶
- Who owns the rule? The control plane. CRUD, authorization, audit, cross-rule validation, rightsizing-driven updates all happen centrally. Single source of truth.
- How does the rule get to the host? An existing configuration-distribution substrate (Pinterest's PinConf — same tier as feature flags and dynamic service config). Canonical config-distribution-for-quota-rules move.
- Who makes the request-time decision? The host, via an in-process library (Pinterest's Service-Protection Framework / SPF). Fast, no RTT, local-health-aware. See local rate-limit decision.
Why this shape¶
Speed¶
Every request-time decision avoids an RTT to a central service. Crucial when the service protected by the rate limit has its own tight latency SLO.
Availability¶
The enforcement path doesn't depend on the control plane being up. Control-plane unavailability freezes rule updates but does not break enforcement. Without this separation, the rate-limit service becomes a hard availability dependency of every service it protects.
Local health awareness¶
The enforcing process sees its own saturation, latency, connection pool depth. A global service can't reasonably synthesize the health of N caller hosts into a rate decision. Local decisions can include graceful rejection based on local service capacity on top of the rule-based limit.
Pinterest's named reasoning¶
"Rate limits lifecycle management should be automated and streamlined.
Rate limit decisions should be made locally in the data path for scalability and performance reasons, with quota management happening in an async fashion." (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)
Trade-offs¶
- Approximate global limits. Each host enforces its own share;
the true fleet-wide rate is
~= sum(host-local-limits). This depends on fleet size — fleet autoscaling changes the effective global limit implicitly unless the per-host limit is recomputed. - Rule staleness window. Between rule authoring and last host applying it, different parts of the fleet enforce different rules. Usually tens of seconds to minutes. For most use cases, acceptable; for pricing or compliance cases, not.
- Harder to reason about. "Did a request get rejected?" requires looking at the host's local log, not a central rate-limit service log — though the usage telemetry still flows back for reporting (via the feedback loop to auto-rightsizing).
Contrast: synchronous global rate-limit service¶
Inline lookup against a central service on every request. Gives exact global limits but adds RTT + hard availability dependency. Valid when limits must be exact (e.g. per-tenant billing caps) and request volume is low enough that RTT cost is acceptable.
Seen in¶
- sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem — canonical wiki instance: Piqama (control plane) + PinConf (async distribution) + SPF (local enforcement) for Pinterest's online- storage rate limits; deployed against TiDB and Key-Value Stores.