CONCEPT Cited by 1 source
Availability Zone balance¶
Availability Zone (AZ) balance is the property of a workload
whose replicas are evenly distributed across the cloud provider's
AZs, so that the failure of any single AZ removes only 1/N of the
replicas (not all of them, and not a majority).
For a 3-AZ region, ideal balance is 33/33/33 per AZ. "Poor AZ balance" means replicas are clumped: e.g. 60/20/20 or 80/10/10. A zone failure against a 80/10/10 distribution takes out 80% of the workload.
Why it's non-trivial to achieve¶
At the cluster-autoscaler layer, AZ balance depends on:
- Capacity distribution of the instance types the autoscaler is
allowed to provision — if
m5.8xlargeis scarce inus-east-1abut plentiful inus-east-1b, a single-instance-type autoscaler will pile into 1b. - Node-group / ASG topology — ASGs distribute across their configured subnets but don't bin-pack across instance sizes; a node-group-per-shape setup ends up with each node group's distribution independently drifting.
- Pod-level anti-affinity — Kubernetes
topologySpreadConstraintscan enforce spread at the pod level, but only if the nodes are actually spread.
Why ASG-driven autoscaling fails at it (at scale)¶
Salesforce explicitly named poor Availability Zone balance as one of the ASG-era limitations that motivated their Karpenter migration:
"These challenges were further exacerbated by structural limitations in the Auto Scaling group–based architecture, including poor Availability Zone balance and performance bottlenecks in large clusters, particularly for memory-intensive workloads." (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
At scale, per-node-group ASG drift stacks across thousands of ASGs (Salesforce had 1,180+ node pools pre-migration) and the aggregate distribution becomes unpredictable. Individual ASGs can be balanced while the overall cluster is not.
How Karpenter addresses it¶
Karpenter's scheduler sees the whole cluster at once: pending pods, existing node distribution per AZ, allowed instance types. It can pick the AZ + instance type combination that most improves overall balance as each node is provisioned, instead of growing a specific ASG that happens to match the shape.
Heterogeneous instance types inside one NodePool (GPU / ARM / x86
all valid for certain workloads) widen the capacity pool per AZ, so
some valid instance is usually available in the under-represented
AZ.
Related¶
- concepts/blast-radius — AZ balance is what bounds blast radius to ~1/N for zone failures.
- concepts/bin-packing — the scheduler primitive that has to consider AZ as a dimension.
- systems/karpenter — the scheduler that Salesforce moved to for better AZ balance.
- systems/aws-auto-scaling-groups — the predecessor primitive that didn't balance well at thousands-of-ASGs scale.
Seen in¶
- sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters — AZ balance named explicitly as a structural ASG limitation Karpenter fixes.
- — Six-AZ regional topology as a resilience lever, and the 3-AZ-quorum assumption that partial partitions can break. The 2025-10-20 incident's closing paragraph states (verbatim): "Per AWS's Well-Architected Framework, the use of three availability zones allows us to tolerate the failure of one but only if network connectivity between the other two remains reliable. AWS us-east-1 happens to have six availability zones and we're looking into how PlanetScale can better use them all to become more resilient to both zonal outages and network partitions between them." Two wiki-canonical contributions: (1) the 6-AZ datum for us-east-1 (most AWS regions have 3 AZs) as a placement- space lever; (2) the 3-AZ-quorum assumption that a minimum-2-replicas-across-3-AZs topology tolerates one AZ down only if the other two stay connected — a partial partition between two of three AZs breaks the assumption. Canonical argument for using 4+ AZs when the region supports them.