PATTERN Cited by 1 source
Gateway autoscaling¶
Horizontally scale the gateway (routing / admission tier) itself, not just the backend it fronts. The number of gateway instances grows under load and shrinks when idle. A gateway that is every query's single hop becomes a single point of failure if it cannot elastically handle traffic spikes — a bare-minimum fleet of gateway instances will max out on CPU/memory under unexpected load and crash, denying the entire downstream service to all callers.
Why the gateway tier needs elasticity¶
The instinct with gateways is to deploy "enough" instances and leave them. This breaks under:
- Unintentional DDoS: a misbehaving client generates millions of queries in a short span.
- Legitimate traffic spikes: a periodic analytics job fires fleet-wide, a product launch triggers query fanout, etc.
If the gateway tier is statically sized, either the deployment over-provisions permanently (cost) or runs close to capacity and crashes on spikes. Horizontal elasticity sidesteps both.
Machinery required¶
- Stateless gateway instances, or gateway state held in a shared store outside the instance, so instances can be added/removed freely.
- External scaling signal: CPU, memory, queue depth, or request-rate thresholds drive scale-up/scale-down decisions.
- Integration with a fleet-wide autoscaling service that knows how to provision instances within the organisation's infrastructure without bespoke per-service work.
Seen in¶
- sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale — Meta Presto Gateway. The number of Gateway instances is dynamic, leaning on a Meta-wide service that supports scaling jobs up and down. Under heavy load the Gateway scales up to handle additional traffic and not be maxed out on CPU/memory, preventing the crash scenario where a single service could bombard the Gateway with millions of queries. Paired with dimensional throttling so the gateway has both admission control and elasticity.