CONCEPT Cited by 1 source

Vertical and horizontal autoscaling¶

Vertical + horizontal autoscaling is the practice of scaling compute capacity along both axes as a single adaptive primitive: add/remove workers (horizontal) and resize the VM instances that host workers (vertical), driven by the same control loop.

Most autoscalers in production are single-axis:

Kubernetes HPA / cloud Auto Scaling Groups — horizontal only: add pods/instances, never resize them
Kubernetes VPA — vertical only: resize pods, never add them
Classical Spark autoscaling — horizontal worker count at a fixed worker-VM size chosen upfront by the user

Two-axis autoscaling is rare because the dependencies between axes make the control loop harder: a vertical scale-up changes the per-worker capacity, which changes whether horizontal scaling is needed, which changes the cost envelope, and so on.

Canonical production instance (Source: sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance):

"It dynamically adjusts compute capacity by scaling horizontally and vertically as needed, preventing out-of-memory failures and maintaining stability as workloads grow."

The Databricks Serverless Autoscaler is the first wiki canonical instance of a production two-axis adaptive autoscaler.

Why each axis matters¶

Horizontal axis¶

Handles throughput / utilisation / parallelism:

More concurrent queries → add more workers
Cluster idle → remove workers (scale to zero if supported)
Shuffle-heavy stage → add capacity for the stage, then shrink

Horizontal scaling is cheap per signal (add a pod, add an instance) but bounded by worker-VM size.

Vertical axis¶

Handles per-task memory envelope / per-instance headroom:

Task OOMs → upsize the VM and retry (see concepts/adaptive-oom-recovery)
Task consistently running hot on CPU → upsize the VM for more cores
Worker under-utilising memory → downsize VM for cost

Vertical scaling is expensive per signal (VM restart or replacement required) but necessary for OOM recovery — you can't add more small workers to fix a task that doesn't fit in a small worker.

Why combining axes is load-bearing¶

Single-axis horizontal autoscalers fail on memory skew: the common failure mode is a task that doesn't fit in the default worker size. Adding more default-sized workers doesn't help. The user is forced to pre-provision for the worst case memory footprint, sacrificing the cost benefits of autoscaling.

Single-axis vertical autoscalers fail on parallelism skew: a workload with many tasks that each fit a small VM would be over-provisioned by upsizing instead of parallelising.

Two-axis combines both: parallelise where you can, upsize where you must.

Trigger signals¶

A two-axis autoscaler needs signals distinguishing the two axes:

Signal	Axis	Response
Task OOM	Vertical	Upsize VM, retry task
Task CPU saturated	Vertical or horizontal	Depends on workload shape
Stage queue depth	Horizontal	Add workers
Idle workers	Horizontal	Remove workers
Low cluster CPU util, tasks small	Vertical (down)	Smaller VM
Sustained high CPU util across all workers	Horizontal	Add workers

The Databricks autoscaler disclosure is limited to the OOM signal → vertical upsize trigger. Other signals are implied but not specified in the blog post.

Two user-exposed modes¶

Databricks Serverless exposes two modes that bias the autoscaler's axis preference:

Standard — biased toward smaller VMs + tighter horizontal scaling (lower cost, higher latency variance)
Performance-Optimized — biased toward larger VMs + looser horizontal scaling (higher cost, faster startup/execution)

This is a declarative-intent knob — the user says what they want (cheap vs fast), not how the autoscaler should achieve it.

Interaction with reactive vs predictive autoscaling¶

Two-axis autoscaling can be combined with reactive or predictive strategies. The Databricks post frames the Serverless Autoscaler as signal-driven (continuously analysing workload patterns) rather than threshold- based — implicitly more predictive than reactive, though the post doesn't claim machine-learning-driven prediction.

Contrast: "Traditional autoscaling approaches rely on static rules and reactive thresholds, which often fail to capture these nuances."

Operational outcomes¶

Named production outcomes when two-axis adaptive autoscaling is combined with adaptive OOM recovery (Source: sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance):

CKDelta: 20 min vs 4–5 hr job completion (12–15× speedup)
Unilever: 2–5× faster pipelines, 25% lower ops cost
HP: 32%+ cloud savings, 36% runtime reduction

Seen in¶

sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance — First canonical wiki instance of production vertical + horizontal autoscaling as a single adaptive primitive. The Databricks Serverless Autoscaler is the implementation. Combines with OOM-aware VM restart as the canonical vertical-axis trigger. Customer outcomes (CKDelta / Unilever / HP) establish the 12–15× / 25–36% impact envelope.

concepts/adaptive-oom-recovery — the canonical vertical-axis trigger
concepts/vertical-scaling — the vertical axis in isolation
concepts/reactive-autoscaling / concepts/predictive-autoscaling — orthogonal strategy axis
concepts/static-stability — the anti-pattern (pre-provision for worst case) this replaces
concepts/memory-overcommit-risk — the failure mode two-axis autoscaling absorbs
systems/databricks-serverless-autoscaler — canonical production instance