CONCEPT Cited by 1 source
Vertical and horizontal autoscaling¶
Vertical + horizontal autoscaling is the practice of scaling compute capacity along both axes as a single adaptive primitive: add/remove workers (horizontal) and resize the VM instances that host workers (vertical), driven by the same control loop.
Most autoscalers in production are single-axis:
- Kubernetes HPA / cloud Auto Scaling Groups — horizontal only: add pods/instances, never resize them
- Kubernetes VPA — vertical only: resize pods, never add them
- Classical Spark autoscaling — horizontal worker count at a fixed worker-VM size chosen upfront by the user
Two-axis autoscaling is rare because the dependencies between axes make the control loop harder: a vertical scale-up changes the per-worker capacity, which changes whether horizontal scaling is needed, which changes the cost envelope, and so on.
Canonical production instance (Source: sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance):
"It dynamically adjusts compute capacity by scaling horizontally and vertically as needed, preventing out-of-memory failures and maintaining stability as workloads grow."
The Databricks Serverless Autoscaler is the first wiki canonical instance of a production two-axis adaptive autoscaler.
Why each axis matters¶
Horizontal axis¶
Handles throughput / utilisation / parallelism:
- More concurrent queries → add more workers
- Cluster idle → remove workers (scale to zero if supported)
- Shuffle-heavy stage → add capacity for the stage, then shrink
Horizontal scaling is cheap per signal (add a pod, add an instance) but bounded by worker-VM size.
Vertical axis¶
Handles per-task memory envelope / per-instance headroom:
- Task OOMs → upsize the VM and retry (see concepts/adaptive-oom-recovery)
- Task consistently running hot on CPU → upsize the VM for more cores
- Worker under-utilising memory → downsize VM for cost
Vertical scaling is expensive per signal (VM restart or replacement required) but necessary for OOM recovery — you can't add more small workers to fix a task that doesn't fit in a small worker.
Why combining axes is load-bearing¶
Single-axis horizontal autoscalers fail on memory skew: the common failure mode is a task that doesn't fit in the default worker size. Adding more default-sized workers doesn't help. The user is forced to pre-provision for the worst case memory footprint, sacrificing the cost benefits of autoscaling.
Single-axis vertical autoscalers fail on parallelism skew: a workload with many tasks that each fit a small VM would be over-provisioned by upsizing instead of parallelising.
Two-axis combines both: parallelise where you can, upsize where you must.
Trigger signals¶
A two-axis autoscaler needs signals distinguishing the two axes:
| Signal | Axis | Response |
|---|---|---|
| Task OOM | Vertical | Upsize VM, retry task |
| Task CPU saturated | Vertical or horizontal | Depends on workload shape |
| Stage queue depth | Horizontal | Add workers |
| Idle workers | Horizontal | Remove workers |
| Low cluster CPU util, tasks small | Vertical (down) | Smaller VM |
| Sustained high CPU util across all workers | Horizontal | Add workers |
The Databricks autoscaler disclosure is limited to the OOM signal → vertical upsize trigger. Other signals are implied but not specified in the blog post.
Two user-exposed modes¶
Databricks Serverless exposes two modes that bias the autoscaler's axis preference:
- Standard — biased toward smaller VMs + tighter horizontal scaling (lower cost, higher latency variance)
- Performance-Optimized — biased toward larger VMs + looser horizontal scaling (higher cost, faster startup/execution)
This is a declarative-intent knob — the user says what they want (cheap vs fast), not how the autoscaler should achieve it.
Interaction with reactive vs predictive autoscaling¶
Two-axis autoscaling can be combined with reactive or predictive strategies. The Databricks post frames the Serverless Autoscaler as signal-driven (continuously analysing workload patterns) rather than threshold- based — implicitly more predictive than reactive, though the post doesn't claim machine-learning-driven prediction.
Contrast: "Traditional autoscaling approaches rely on static rules and reactive thresholds, which often fail to capture these nuances."
Operational outcomes¶
Named production outcomes when two-axis adaptive autoscaling is combined with adaptive OOM recovery (Source: sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance):
- CKDelta: 20 min vs 4–5 hr job completion (12–15× speedup)
- Unilever: 2–5× faster pipelines, 25% lower ops cost
- HP: 32%+ cloud savings, 36% runtime reduction
Seen in¶
- sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance — First canonical wiki instance of production vertical + horizontal autoscaling as a single adaptive primitive. The Databricks Serverless Autoscaler is the implementation. Combines with OOM-aware VM restart as the canonical vertical-axis trigger. Customer outcomes (CKDelta / Unilever / HP) establish the 12–15× / 25–36% impact envelope.
Related¶
- concepts/adaptive-oom-recovery — the canonical vertical-axis trigger
- concepts/vertical-scaling — the vertical axis in isolation
- concepts/reactive-autoscaling / concepts/predictive-autoscaling — orthogonal strategy axis
- concepts/static-stability — the anti-pattern (pre-provision for worst case) this replaces
- concepts/memory-overcommit-risk — the failure mode two-axis autoscaling absorbs
- systems/databricks-serverless-autoscaler — canonical production instance