PATTERN Cited by 1 source

Gateway throttling by dimension¶

Admission control at a query/request gateway with per-dimension knobs: per-user, per-source, per-IP, and global. Reject requests (or queue with hard bounds) when the gateway is under heavy load, calibrated per dimension so a single misbehaving actor cannot exhaust the gateway's capacity for everyone else.

Why multi-dimensional¶

A purely global throttle degrades every caller when one caller misbehaves — the classic one-service-kills-the-gateway failure mode. Per-dimension throttling localises the blast radius:

Per-user / per-source / per-IP: caps how much one actor can drive the gateway, so an unintentional client-side bug that generates millions of queries throttles the offender rather than the shared infrastructure.
Global: still necessary for overall gateway protection when aggregate load exceeds gateway capacity.

The combination means the gateway can protect itself against both single-caller spikes and aggregate load, without over-throttling well-behaved traffic.

Context: unintentional DDoS¶

The failure mode that motivated this pattern at Meta scale was explicitly named:

"one service unintentionally bombarding the Gateway with millions of queries in a short span, resulting in the Gateway processes crashing and unable to route any queries."

Throttling is the admission-control half of the response; horizontal autoscaling is the elasticity half. Together they produce a gateway that "can withstand adverse unpredictable traffic patterns."

Seen in¶

sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale — Meta Presto Gateway. Throttling is activated based on query count per second across per-user, per-source, per-IP, and global dimensions. Paired with gateway autoscaling to give the gateway both admission control and elasticity against unintended DDoS-style traffic spikes.

Gateway throttling by dimension¶

Why multi-dimensional¶

Context: unintentional DDoS¶

Seen in¶

Related¶