Skip to content

PATTERN Cited by 1 source

Heterogeneous cluster provisioning

Heterogeneous cluster provisioning is the EC2-capacity pattern of discovering a set of instance types that can satisfy a workload's resource shape, then provisioning the most readily available mix across availability zones at the moment compute is needed — instead of demanding one specific instance type.

The trade-off it makes: you don't know exactly what your cluster is going to look like; you do know its aggregate resources are enough, and it will actually come up on time.

Shape

  1. Workload declares a resource shape — e.g. "1 TiB RAM + 128 vCPUs".
  2. Pre-flight capability lookup: enumerate all EC2 instance types whose per-instance RAM/vCPU/network/disk characteristics are suitable building blocks for this shape (e.g. R6g-16xlarge, R6g-8xlarge, R5-4xlarge).
  3. Availability lookup across AZs for those candidates.
  4. Fast provisioning — provision whichever candidates are most readily available, across AZs, until the aggregate shape is met. Example: one R6G-16xlarge (64 vCPU / 512 GiB RAM) in AZ1 + a mix of R6G-8xlarge and R5-4xlarge in AZ2 totalling the remaining 64 vCPU / 512 GiB.
  5. Start workload.

Consequences the application has to accept

  • No assumption about CPU architecture — workload binaries must be built for every candidate arch (x86_64 + Graviton at least).
  • No assumption about disk type / ephemeral disk shape.
  • No assumption about network-speed tier — application must tolerate the slowest candidate's NIC.
  • No assumption about exact instance identity — cluster topology is dynamic across instances and AZs.

In exchange: jobs actually start in a capacity regime where any single instance type would have wait times measured in hours, or where AZ-specific capacity fails.

Amazon BDT's instantiation

From the Spark → Ray migration post:

"Their approach involves first finding a set of potential Amazon EC2 instance types that can meet expected hardware resource requirements, then provisioning the most readily available instance types from that set. As a side effect, they effectively trade knowing exactly what Amazon EC2 instance type a Ray cluster will get for provisioning pseudo-random instance types faster. This also means that BDT's Ray applications need to remove any assumptions about their underlying CPU architectures, disk types, or other hardware." (Source: sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2)

Additional load-bearing detail: BDT preemptively provisions — the instances are ready right before the workload needs them, by forecasting load and using historical EC2 resource-utilisation trends to pick the next cluster shape.

Why it generalises

EC2 capacity is non-uniform across instance types and AZs, and shifts intra-day. Any workload that provisions large clusters frequently will hit capacity tails. Heterogeneous provisioning turns a capacity-constrained system into a capacity-averaging system — aggregate demand against aggregate supply, rather than one workload against one type pool.

The pattern extends beyond Ray: Kubernetes cluster autoscalers on EC2, Spot-aware schedulers, EMR instance fleets, Lambda's internal Firecracker placement, and Fargate's instance selection all do variants of this. It is the current-generation answer to "what instance type should I use?" which was the common pre-autoscaling question.

  • systems/aws-ec2 — the substrate whose capacity non-uniformity drives the pattern.
  • systems/ray — the BDT case uses it to keep Ray clusters available at exabyte scale.
  • concepts/memory-aware-scheduling — the intra-cluster counterpart: once you have heterogeneous workers, pack tasks by their real resource demand, not a worst-case.

Seen in

Last updated · 200 distilled / 1,178 read