PATTERN Cited by 1 source
Two-axis horizontal + vertical autoscaling¶
Pattern¶
Combine request-based horizontal scaling (add/remove replicas instantly as traffic shifts) with resource-based vertical scaling (adjust per-replica concurrency as the workload's resource profile becomes clear) in a single controller where the vertical axis feeds the horizontal formula.
The horizontal axis handles availability and latency (the fast decision). The vertical axis handles efficiency (the steady-state optimization). Together they avoid the false choice between fast-but-wasteful (request-only scaling) and efficient-but-slow (resource-only scaling).
Key design properties¶
- Coupled, not independent — vertical output is horizontal input.
- Different cadences — horizontal is fast (seconds), vertical is slow (tens of seconds) to prevent interference.
- Asymmetric in both axes — scale-up is aggressive, scale-down is conservative (see patterns/asymmetric-aggressive-up-conservative-down-autoscaling).
Known uses¶
| System | Horizontal signal | Vertical signal | Context |
|---|---|---|---|
| APA | Active concurrent requests (5s) | CPU/GPU util + latency + GPU FLOPS (30s) | Custom Model Serving |
| systems/databricks-serverless-autoscaler | Query load | OOM detection → larger VM | Serverless Spark |