Skip to content

CONCEPT Cited by 1 source

Warm node pool

Definition

A warm node pool is a set of pre-provisioned Kubernetes nodes with the base runtime image already pulled, maintained by a predictive algorithm ahead of demand. When an autoscaler adds a replica, it picks from this pool — the node is already up, the base image already downloaded, and the only remaining cold-start work is downloading the model artifact itself.

Cold-start decomposition

A serving cold start has multiple phases:

  1. Node provisioning — eliminated by the warm pool
  2. Base image pull — eliminated by pre-loading onto warm nodes
  3. Model download — reduced via hot cache + parallel chunk download
  4. Model initialization — irreducible floor (grows with model size)

Warm pools attack phases 1 and 2 completely. Combined with parallel model download from a hot cache layer, the platform collapses most of the cold-start budget into phase 4 alone.

Databricks implementation

  • A predictive algorithm maintains the pool per cluster
  • Databricks doesn't charge customers for warm-pool capacity — "it's direct value they get from Databricks"
  • Complements provisioned concurrency (for endpoints that cannot tolerate any cold start) and zero-downtime updates (new pods ready before traffic moves)

Production observation

"You cannot optimize cold starts away. [...] Physics has a floor: bringing a pod up takes time that grows with model size, minutes for large GPU models. Past that floor the only answer is keeping a min capacity fully ready."

Seen in

Last updated · 542 distilled / 1,571 read