Skip to content

PATTERN Cited by 1 source

Gateway autoscaling

Horizontally scale the gateway (routing / admission tier) itself, not just the backend it fronts. The number of gateway instances grows under load and shrinks when idle. A gateway that is every query's single hop becomes a single point of failure if it cannot elastically handle traffic spikes — a bare-minimum fleet of gateway instances will max out on CPU/memory under unexpected load and crash, denying the entire downstream service to all callers.

Why the gateway tier needs elasticity

The instinct with gateways is to deploy "enough" instances and leave them. This breaks under:

  • Unintentional DDoS: a misbehaving client generates millions of queries in a short span.
  • Legitimate traffic spikes: a periodic analytics job fires fleet-wide, a product launch triggers query fanout, etc.

If the gateway tier is statically sized, either the deployment over-provisions permanently (cost) or runs close to capacity and crashes on spikes. Horizontal elasticity sidesteps both.

Machinery required

  • Stateless gateway instances, or gateway state held in a shared store outside the instance, so instances can be added/removed freely.
  • External scaling signal: CPU, memory, queue depth, or request-rate thresholds drive scale-up/scale-down decisions.
  • Integration with a fleet-wide autoscaling service that knows how to provision instances within the organisation's infrastructure without bespoke per-service work.

Seen in

Last updated · 517 distilled / 1,221 read