Skip to content

CONCEPT Cited by 1 source

Availability Zone balance

Availability Zone (AZ) balance is the property of a workload whose replicas are evenly distributed across the cloud provider's AZs, so that the failure of any single AZ removes only 1/N of the replicas (not all of them, and not a majority).

For a 3-AZ region, ideal balance is 33/33/33 per AZ. "Poor AZ balance" means replicas are clumped: e.g. 60/20/20 or 80/10/10. A zone failure against a 80/10/10 distribution takes out 80% of the workload.

Why it's non-trivial to achieve

At the cluster-autoscaler layer, AZ balance depends on:

  • Capacity distribution of the instance types the autoscaler is allowed to provision — if m5.8xlarge is scarce in us-east-1a but plentiful in us-east-1b, a single-instance-type autoscaler will pile into 1b.
  • Node-group / ASG topology ASGs distribute across their configured subnets but don't bin-pack across instance sizes; a node-group-per-shape setup ends up with each node group's distribution independently drifting.
  • Pod-level anti-affinity — Kubernetes topologySpreadConstraints can enforce spread at the pod level, but only if the nodes are actually spread.

Why ASG-driven autoscaling fails at it (at scale)

Salesforce explicitly named poor Availability Zone balance as one of the ASG-era limitations that motivated their Karpenter migration:

"These challenges were further exacerbated by structural limitations in the Auto Scaling group–based architecture, including poor Availability Zone balance and performance bottlenecks in large clusters, particularly for memory-intensive workloads." (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)

At scale, per-node-group ASG drift stacks across thousands of ASGs (Salesforce had 1,180+ node pools pre-migration) and the aggregate distribution becomes unpredictable. Individual ASGs can be balanced while the overall cluster is not.

How Karpenter addresses it

Karpenter's scheduler sees the whole cluster at once: pending pods, existing node distribution per AZ, allowed instance types. It can pick the AZ + instance type combination that most improves overall balance as each node is provisioned, instead of growing a specific ASG that happens to match the shape.

Heterogeneous instance types inside one NodePool (GPU / ARM / x86 all valid for certain workloads) widen the capacity pool per AZ, so some valid instance is usually available in the under-represented AZ.

Seen in

  • sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters — AZ balance named explicitly as a structural ASG limitation Karpenter fixes.
  • Six-AZ regional topology as a resilience lever, and the 3-AZ-quorum assumption that partial partitions can break. The 2025-10-20 incident's closing paragraph states (verbatim): "Per AWS's Well-Architected Framework, the use of three availability zones allows us to tolerate the failure of one but only if network connectivity between the other two remains reliable. AWS us-east-1 happens to have six availability zones and we're looking into how PlanetScale can better use them all to become more resilient to both zonal outages and network partitions between them." Two wiki-canonical contributions: (1) the 6-AZ datum for us-east-1 (most AWS regions have 3 AZs) as a placement- space lever; (2) the 3-AZ-quorum assumption that a minimum-2-replicas-across-3-AZs topology tolerates one AZ down only if the other two stay connected — a partial partition between two of three AZs breaks the assumption. Canonical argument for using 4+ AZs when the region supports them.
Last updated · 542 distilled / 1,571 read