Skip to content

PATTERN Cited by 1 source

Two-axis horizontal + vertical autoscaling

Pattern

Combine request-based horizontal scaling (add/remove replicas instantly as traffic shifts) with resource-based vertical scaling (adjust per-replica concurrency as the workload's resource profile becomes clear) in a single controller where the vertical axis feeds the horizontal formula.

desired_replicas = ceil(active_requests / target_concurrency)
                                     set by vertical axis

The horizontal axis handles availability and latency (the fast decision). The vertical axis handles efficiency (the steady-state optimization). Together they avoid the false choice between fast-but-wasteful (request-only scaling) and efficient-but-slow (resource-only scaling).

Key design properties

  1. Coupled, not independent — vertical output is horizontal input.
  2. Different cadences — horizontal is fast (seconds), vertical is slow (tens of seconds) to prevent interference.
  3. Asymmetric in both axes — scale-up is aggressive, scale-down is conservative (see patterns/asymmetric-aggressive-up-conservative-down-autoscaling).

Known uses

System Horizontal signal Vertical signal Context
APA Active concurrent requests (5s) CPU/GPU util + latency + GPU FLOPS (30s) Custom Model Serving
systems/databricks-serverless-autoscaler Query load OOM detection → larger VM Serverless Spark

Seen in

Last updated · 542 distilled / 1,571 read