PATTERN Cited by 1 source
Gateway throttling by dimension¶
Admission control at a query/request gateway with per-dimension knobs: per-user, per-source, per-IP, and global. Reject requests (or queue with hard bounds) when the gateway is under heavy load, calibrated per dimension so a single misbehaving actor cannot exhaust the gateway's capacity for everyone else.
Why multi-dimensional¶
A purely global throttle degrades every caller when one caller misbehaves — the classic one-service-kills-the-gateway failure mode. Per-dimension throttling localises the blast radius:
- Per-user / per-source / per-IP: caps how much one actor can drive the gateway, so an unintentional client-side bug that generates millions of queries throttles the offender rather than the shared infrastructure.
- Global: still necessary for overall gateway protection when aggregate load exceeds gateway capacity.
The combination means the gateway can protect itself against both single-caller spikes and aggregate load, without over-throttling well-behaved traffic.
Context: unintentional DDoS¶
The failure mode that motivated this pattern at Meta scale was explicitly named:
"one service unintentionally bombarding the Gateway with millions of queries in a short span, resulting in the Gateway processes crashing and unable to route any queries."
Throttling is the admission-control half of the response; horizontal autoscaling is the elasticity half. Together they produce a gateway that "can withstand adverse unpredictable traffic patterns."
Seen in¶
- sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale — Meta Presto Gateway. Throttling is activated based on query count per second across per-user, per-source, per-IP, and global dimensions. Paired with gateway autoscaling to give the gateway both admission control and elasticity against unintended DDoS-style traffic spikes.