Skip to content

PATTERN Cited by 1 source

Hot cluster for iterative ML

Description

Keep compute clusters warm and provisioned between ML workflow runs rather than cold-starting a new cluster for each execution. This eliminates the provisioning delay (often 30 seconds to several minutes) that interrupts rapid iterative experimentation.

The pattern is most valuable when: - Teams are iterating rapidly (multiple runs per hour) - Cluster provisioning time is a significant fraction of total run time - The cost of idle resources is justified by developer productivity gains

Trade-offs

Pro Con
Eliminates provisioning latency Idle clusters consume resources
Enables rapid iteration cycles Requires idle-timeout / cost-management policy
Reduces context-switching for developers May create multi-tenant contention if shared

Seen In

(Source: sources/2026-06-10-atlassian-architecting-scalable-ml-platforms)

Last updated · 542 distilled / 1,571 read