CONCEPT Cited by 1 source
Warm node pool¶
Definition¶
A warm node pool is a set of pre-provisioned Kubernetes nodes with the base runtime image already pulled, maintained by a predictive algorithm ahead of demand. When an autoscaler adds a replica, it picks from this pool — the node is already up, the base image already downloaded, and the only remaining cold-start work is downloading the model artifact itself.
Cold-start decomposition¶
A serving cold start has multiple phases:
- Node provisioning — eliminated by the warm pool
- Base image pull — eliminated by pre-loading onto warm nodes
- Model download — reduced via hot cache + parallel chunk download
- Model initialization — irreducible floor (grows with model size)
Warm pools attack phases 1 and 2 completely. Combined with parallel model download from a hot cache layer, the platform collapses most of the cold-start budget into phase 4 alone.
Databricks implementation¶
- A predictive algorithm maintains the pool per cluster
- Databricks doesn't charge customers for warm-pool capacity — "it's direct value they get from Databricks"
- Complements provisioned concurrency (for endpoints that cannot tolerate any cold start) and zero-downtime updates (new pods ready before traffic moves)
Production observation¶
"You cannot optimize cold starts away. [...] Physics has a floor: bringing a pod up takes time that grows with model size, minutes for large GPU models. Past that floor the only answer is keeping a min capacity fully ready."
Seen in¶
- sources/2026-06-10-databricks-ai-serving-platform-that-adapts-to-your-model — First canonical disclosure of the predictive warm-pool mechanism.