Skip to content

Software System Design Wiki

Local rate-limit decision

Software System Design Wiki

CONCEPT Cited by 1 source

Local rate-limit decision¶

A local rate-limit decision is an architectural choice: the request-time "accept or reject" call happens inside the requesting host's process (via an embedded library reading locally-distributed rules and local counters), rather than via a synchronous lookup against a global rate-limit service.

The two shapes¶

Global rate-limit service¶

[client] ──request──▶ [server]
                          │
                          ├──synchronous RPC──▶ [global rate-limit svc] ──▶ [counters]
                          │                                 │
                          │◀────────allow/deny─────────────┘
                          ▼
                       serve or reject

Pros: exact global limits (one source of truth across fleet); easy to reason about numerically.

Cons: RTT added to every request (directly on the hot path); the service is a hard availability dependency of every caller — if the rate-limit service is down, the calling service has to choose between fail-open (no protection) and fail-closed (unavailability).

Local rate-limit decision¶

[client] ──request──▶ [server]
                          │
                          │  (reads local rule cache + local counter)
                          │
                          ▼
                       serve or reject

Pros: zero hot-path RTT; no run-time dependency on a rate-limit service (rules are pushed asynchronously); local service-health signals (saturation, local latency) can be folded into the decision for graceful rejection.

Cons: limits are approximate at the fleet level (each host enforces its local share; true fleet rate ≈ sum(host-local-limits)); rule rollouts are eventually consistent (different hosts may briefly see different rules).

Pinterest's named reasoning¶

"Rate limit decisions should be made locally in the data path for scalability and performance reasons, with quota management happening in an async fashion. … Currently this is done by integrating an in-house rate limiting library into the application service. This enables fast rate limiting decisions (in contrast to relying on a global rate limiting service), and also the flexibility to make local decisions based on service health information (e.g. to support graceful rejection based on service capacity)." (Source: sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem)

The in-house library here is Service-Protection Framework (SPF); rules are distributed via PinConf; authored in Piqama. This is the canonical wiki instance of async-centralized quota + local enforcement.

When local beats global¶

Hot-path sensitive to tail latency. The RTT to a rate-limit service is a direct adder to p99 of every rate-limited endpoint.
Rate-limit service availability cannot exceed the calling service's availability target — adding a hard dep degrades both.
Local health matters. This host is overloaded; a global service can't know that cheaply.
Fan-out. A global service sees every decision from every client; its own scaling becomes the bottleneck.

When global beats local¶

Exact cross-host limits required (e.g. per-tenant pricing caps).
Low request volume so RTT cost is acceptable.
Short-lived hosts / serverless where fleet-local counters fragment into too-small buckets.

See also¶

Hot path — the latency-critical request path that the local-decision shape protects.

Seen in¶

sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystem — canonical wiki introduction. Piqama's rate-limit variant chose local decisions for the named speed + local-health-awareness reasons; explicitly contrasts "fast rate limiting decisions (in contrast to relying on a global rate limiting service)".

Last updated 2026-06-20 00:03 UTC · 542 distilled / 1,571 read