Skip to content

CONCEPT Cited by 1 source

Vertical and horizontal autoscaling

Vertical + horizontal autoscaling is the practice of scaling compute capacity along both axes as a single adaptive primitive: add/remove workers (horizontal) and resize the VM instances that host workers (vertical), driven by the same control loop.

Most autoscalers in production are single-axis:

  • Kubernetes HPA / cloud Auto Scaling Groups — horizontal only: add pods/instances, never resize them
  • Kubernetes VPA — vertical only: resize pods, never add them
  • Classical Spark autoscaling — horizontal worker count at a fixed worker-VM size chosen upfront by the user

Two-axis autoscaling is rare because the dependencies between axes make the control loop harder: a vertical scale-up changes the per-worker capacity, which changes whether horizontal scaling is needed, which changes the cost envelope, and so on.

Canonical production instance (Source: sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance):

"It dynamically adjusts compute capacity by scaling horizontally and vertically as needed, preventing out-of-memory failures and maintaining stability as workloads grow."

The Databricks Serverless Autoscaler is the first wiki canonical instance of a production two-axis adaptive autoscaler.

Why each axis matters

Horizontal axis

Handles throughput / utilisation / parallelism:

  • More concurrent queries → add more workers
  • Cluster idle → remove workers (scale to zero if supported)
  • Shuffle-heavy stage → add capacity for the stage, then shrink

Horizontal scaling is cheap per signal (add a pod, add an instance) but bounded by worker-VM size.

Vertical axis

Handles per-task memory envelope / per-instance headroom:

  • Task OOMs → upsize the VM and retry (see concepts/adaptive-oom-recovery)
  • Task consistently running hot on CPU → upsize the VM for more cores
  • Worker under-utilising memory → downsize VM for cost

Vertical scaling is expensive per signal (VM restart or replacement required) but necessary for OOM recovery — you can't add more small workers to fix a task that doesn't fit in a small worker.

Why combining axes is load-bearing

Single-axis horizontal autoscalers fail on memory skew: the common failure mode is a task that doesn't fit in the default worker size. Adding more default-sized workers doesn't help. The user is forced to pre-provision for the worst case memory footprint, sacrificing the cost benefits of autoscaling.

Single-axis vertical autoscalers fail on parallelism skew: a workload with many tasks that each fit a small VM would be over-provisioned by upsizing instead of parallelising.

Two-axis combines both: parallelise where you can, upsize where you must.

Trigger signals

A two-axis autoscaler needs signals distinguishing the two axes:

Signal Axis Response
Task OOM Vertical Upsize VM, retry task
Task CPU saturated Vertical or horizontal Depends on workload shape
Stage queue depth Horizontal Add workers
Idle workers Horizontal Remove workers
Low cluster CPU util, tasks small Vertical (down) Smaller VM
Sustained high CPU util across all workers Horizontal Add workers

The Databricks autoscaler disclosure is limited to the OOM signal → vertical upsize trigger. Other signals are implied but not specified in the blog post.

Two user-exposed modes

Databricks Serverless exposes two modes that bias the autoscaler's axis preference:

  • Standard — biased toward smaller VMs + tighter horizontal scaling (lower cost, higher latency variance)
  • Performance-Optimized — biased toward larger VMs + looser horizontal scaling (higher cost, faster startup/execution)

This is a declarative-intent knob — the user says what they want (cheap vs fast), not how the autoscaler should achieve it.

Interaction with reactive vs predictive autoscaling

Two-axis autoscaling can be combined with reactive or predictive strategies. The Databricks post frames the Serverless Autoscaler as signal-driven (continuously analysing workload patterns) rather than threshold- based — implicitly more predictive than reactive, though the post doesn't claim machine-learning-driven prediction.

Contrast: "Traditional autoscaling approaches rely on static rules and reactive thresholds, which often fail to capture these nuances."

Operational outcomes

Named production outcomes when two-axis adaptive autoscaling is combined with adaptive OOM recovery (Source: sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance):

  • CKDelta: 20 min vs 4–5 hr job completion (12–15× speedup)
  • Unilever: 2–5× faster pipelines, 25% lower ops cost
  • HP: 32%+ cloud savings, 36% runtime reduction

Seen in

Last updated · 451 distilled / 1,324 read