Skip to content

CONCEPT Cited by 1 source

Per-tenant rate limit

Definition

A per-tenant rate limit is a rate-limiting budget that a shared service applies independently per tenant rather than globally across all tenants. In a multi-tenant architecture, this is the primary mechanism that prevents one tenant's traffic from starving another's: each tenant gets its own quota, refilled independently, so a hot tenant exhausting its budget doesn't reduce capacity available to other tenants.

When the shared service is a third-party API, the platform's own design has to bend around the vendor's per-tenant rate-limit shape — which is the 2026-05-14 Instacart source's central architectural pressure.

Canonical instance — Instacart's third-party marketing

provider

Verbatim from the 2026-05-14 source (sources/2026-05-14-instacart-scaling-personalized-marketing-for-multi-tenant-commerce-platforms):

"Our third-party provider imposes strict API constraints. For example, requests are rate-limited per retailer, and individual send APIs support batches of up to 50 users per call. Processing users individually would have made large-scale campaign delivery both slower and more expensive."

Two concrete consequences shape the upstream design:

  1. Maximum-batch-size dictation. The vendor's batch API accepts up to 50 users per call. The platform's stream consumer therefore rebatches per-user events into groups of 50 — see patterns/stream-rebatch-for-downstream-batch-api. Without rebatching, the platform would burn 50× more API quota than necessary.
  2. Per-retailer quota awareness. Even with rebatching, the CRM Service has to track each retailer's per-workspace quota so that no single retailer's campaign starves the platform's overall throughput. "Throttling controls to protect shared resources under load" (per the source) operate at the per-retailer grain.

Why per-tenant rate limits exist

A single global rate limit on a multi-tenant API would create a noisy-neighbor failure mode at the rate-limit layer:

  • Tenant A launches a one-million-message campaign.
  • Tenant A's bursty workload exhausts the global quota in seconds.
  • Tenants B–Z's normal-rate sends fail with HTTP 429.
  • Recovery: wait for the global quota window to refresh — while every tenant's sends stay degraded.

Per-tenant rate limits eliminate this failure mode by giving each tenant its own bucket. Tenant A's burst hits its own quota cap and gets throttled; tenants B–Z see no impact.

This is why per-tenant rate-limiting is a defining property of mature multi-tenant SaaS APIs.

A noisy-neighbor sub-shape

Per-tenant rate limits prevent the most obvious form of noisy-neighbor — but they create their own variant of the problem:

  • Within a tenant, all of that tenant's workloads share one rate-limit budget. If tenant A runs both a winback campaign and a promotional campaign concurrently, they contend for tenant A's budget.
  • For the platform that mediates between tenants and the vendor, the budget per tenant is effectively a capacity per-tenant signal that the platform has to translate into back-pressure or scheduling decisions upstream.

So the platform has to operate at two rate-limit granularities: the vendor's per-tenant budget (out of platform's control) and the platform's own decisions about how to allocate that budget across the tenant's concurrent campaigns.

Composes with

Caveats

  • The 2026-05-14 source does not disclose the per-retailer rate-limit values or the platform's quota-management internals.
  • Per-tenant rate limits at the vendor are soft caps — they protect against the worst noisy-neighbor scenarios but don't prevent within-tenant contention.
  • The trade-off vendors make between low per-tenant cap + large workspace count vs high per-tenant cap + small workspace count is opaque and changes the platform's capacity-planning calculus.

Seen in

Last updated · 542 distilled / 1,571 read