AWS 2026-01-12

How Salesforce migrated from Cluster Autoscaler to Karpenter across their fleet of 1,000 EKS clusters¶

Summary¶

AWS Architecture Blog case study (2026-01-12) documenting Salesforce's mid-2025→early-2026 migration of its Kubernetes platform — 1,000+ Amazon EKS clusters, 1,180+ node pools, thousands of internal tenants — from the Kubernetes Cluster Autoscaler (running over AWS Auto Scaling groups) to Karpenter, AWS's open-source pod-driven node autoscaler. The motivation was structural: the Auto Scaling group / Cluster Autoscaler combo produced multi-minute scaling latency during demand spikes, thousands of rigid node groups that slowed innovation, poor bin-packing and conservative scale-down stranding resources, and poor AZ balance + large-cluster performance bottlenecks for memory-intensive workloads. Salesforce built two in-house tools — a Karpenter transition tool (cordons + PDB-respecting drain + rollback) and a Karpenter patching check tool (AMI validation) — embedded them in the provisioning CI/CD pipeline, automated the Auto-Scaling-group-config → NodePool / EC2NodeClass mapping, and rolled out with soak times starting from the least-critical environments. Outcomes: 80% reduction in manual operational overhead, scaling latency minutes → seconds, 5% FY2026 cost savings with another projected 5-10% for FY2027, elimination of thousands of node groups, plus developer self-service and heterogeneous instance types (GPU / ARM / x86) in a single node pool. Five named operational lessons — PDB hygiene (OPA-enforced), sequential not parallel cordoning, 63-character label-length limit, singleton-pod protection under bin-packing consolidation, 1:1 ephemeral-storage mapping — are the re-usable substance of the post.

Key takeaways¶

Auto Scaling groups become the scaling bottleneck at scale. Salesforce's pre-migration architecture of one node group per workload-shape × AZ had grown to thousands of ASGs. Cluster Autoscaler scales by asking an ASG to grow/shrink, which adds minutes of scaling latency during demand spikes; Karpenter by contrast looks at pending pods and provisions instances directly, collapsing latency to seconds. (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
At 1,000+ clusters and 1,180+ node pools, manual migration is infeasible — the migration itself becomes a product. Salesforce built the Karpenter transition tool (orchestrates cordon-and-drain with PDB respect, rollback-to-ASG, CI/CD-integrated) and the Karpenter patching check tool (AMI validation). Embedding them in the core infrastructure provisioning pipeline is what made the rollout repeatable across thousands of clusters and made rollback a first-class operation — not a scramble. (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
Automate the configuration mapping; don't hand-translate. Legacy ASG fields map cleanly onto Karpenter concepts: ASG instance types → EC2NodeClass instance types, root-volume sizes/IOPS/type/throughput → EC2NodeClass storage parameters, node labels → NodePool + EC2NodeClass labels. With 1,180 node pools of highly diverse config, automated mapping was essential to minimise human error. (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
Pod Disruption Budgets are where Karpenter's consolidation meets application reality — treat them as a governance primitive, not a knob. Several Salesforce services had overly restrictive or misconfigured PDBs that blocked node replacements entirely. Remediation had three parts: audit the existing configurations, partner with application owners to fix them, and install OPA policies for proactive PDB validation at admission. (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
Sequential, checkpointed node cordoning beats parallel cordoning. The team's initial approach — cordoning Karpenter nodes in parallel — caused unexpected cluster-health issues. They reworked it to sequential node cordoning with manual verification checkpoints (with rollback) and enhanced monitoring for early instability detection. Modern tooling doesn't remove the need for careful orchestration of node maintenance; it shifts what needs orchestrating. See patterns/sequential-node-cordoning. (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
Kubernetes's concepts/kubernetes-label-length-limit|63-character label-length limit is a migration-blocker hiding in plain sight. Salesforce's human-friendly legacy naming (example quoted: analytics-bigdata-spark-executor-pool-m6a-32xlarge-az-a-b-c — 67 chars) produced metadata.labels: Invalid value: must be no more than 63 characters errors from Karpenter's label-dependent operations. The fix was refactoring naming conventions cluster-wide before the switch. "Seemingly minor technical constraints can become significant blockers in automated infrastructure management if not properly addressed early." (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
Karpenter's bin-packing + consolidation can terminate single-replica pods without warning. A well-known gotcha of efficient bin-packing schedulers: consolidation (re-packing workloads onto fewer, larger nodes) can evict a singleton before a replacement has had time to start, causing service disruption. Salesforce's response was to roll out guaranteed pod lifetime features and workload-aware disruption policies to safeguard singletons. Reinforces the principle that "effective auto scaling solutions must balance infrastructure efficiency with application availability requirements, particularly for mission-critical services." (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
Ephemeral-storage settings must be 1:1 translated, not defaulted. Some workloads failed to schedule after migration because ephemeral-storage configuration was incomplete on the Karpenter side. Fix: implement precise 1:1 mappings between ASG-defined volume settings and EC2NodeClass parameters. I/O-intensive applications are the canonical casualty. (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
Heterogeneous instance types inside one node pool is a true capability unlock, not a feature nice-to-have. Post-Karpenter, a single Salesforce node pool can host GPU / ARM / x86 instances together — the scheduler picks whichever type best fits the pending pods right now. For platform teams this collapses pool-count × node-shape-count from an O(N×M) grid into a much smaller O(N) set. Also improves IP efficiency by decoupling node provisioning from specific subnets. (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
Phased, risk-graded rollout is how you migrate 1,000 production clusters without a regression. Salesforce's sequencing: mid-2025 through early 2026, soak times between stages, least-critical environments first to validate tooling and ops, high-stakes production last. This is the phased-migration-with-soak-times pattern at its canonical scale. (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)

Operational numbers¶

Fleet scale: 1,000+ EKS clusters ("one of the world's most complex Kubernetes platforms"); 1,180+ node pools.
Tenants: thousands of internal tenants across Salesforce.
Scaling latency: minutes (Cluster Autoscaler / ASG-driven) → seconds (Karpenter pending-pod-driven).
Operational overhead: 80% reduction in manual ops attributable to automation + self-service.
Cost savings: 5% FY2026 (rollout in progress); projected additional 5-10% FY2027.
Rollout window: mid-2025 → early 2026 (multi-stage with soak times).
Industry context: Datadog reports +22% Karpenter-provisioned node share in the last 2 years across surveyed Kubernetes fleets.
Label-length limit: Kubernetes metadata labels = max 63 characters (the breaking constant).
Example config surfaced in the post (legacy ASG mapping input):
k8s_instance_type: m6i.8xlarge, k8s_root_volume_size: 100, k8s_root_volume_iops: 3000, k8s_root_volume_type: gp3, k8s_root_volume_throughput: 125, k8s_min_node_number: 300, k8s_max_node_number: 2500, multi_az_provisioned_workers: false, asg_launch_type: launch_template, gpu_enabled: false.

Systems surfaced¶

systems/karpenter — the target autoscaler. Salesforce deployment is a canonical 1,000-cluster / 1,180-node-pool production reference.
systems/aws-eks — substrate for Salesforce's 1,000+ clusters.
systems/cluster-autoscaler — the legacy autoscaler being retired.
systems/aws-auto-scaling-groups — the legacy EC2-capacity primitive that made Cluster Autoscaler slow at this scale.
systems/aws-ec2 — underlying compute.
systems/open-policy-agent — PDB admission-validation enforcement point.
systems/kubernetes — the orchestrator.

Concepts surfaced¶

concepts/bin-packing — Karpenter's core allocation primitive; post-migration consolidation wins vs ASG "stranded capacity".
concepts/scaling-latency — the central performance metric; the migration collapses this from minutes to seconds.
concepts/pod-disruption-budget — PDB hygiene is where this kind of migration lives-or-dies.
concepts/availability-zone-balance — named explicitly as a structural ASG limitation Karpenter fixes.
concepts/kubernetes-label-length-limit — 63-char metadata-label limit; migration-blocking in practice.
concepts/ip-address-fragmentation — subnet-decoupling improves IP efficiency.
concepts/singleton-workload — 1-replica pods as the special case that bin-packing consolidation can harm.
concepts/self-service-infrastructure — the developer-experience outcome post-migration.

Patterns surfaced¶

patterns/disruption-budget-guarded-upgrades — already-canonical wiki pattern; Salesforce adds a 1,000-cluster-scale production reference to the Generali instance, with OPA-enforced PDB admission as the governance layer.
patterns/automated-configuration-mapping — new pattern: when migrating >1K heterogeneous configs between tooling, you encode the translation as code, not as a runbook.
patterns/phased-migration-with-soak-times — new pattern: multi-stage rollout with intentional soak windows between stages to catch latent issues.
patterns/rollback-capable-migration-tool — new pattern: a migration automation tool where "reverse transition" is a first-class command, not an emergency fallback.
patterns/sequential-node-cordoning — new pattern: for node-replacement campaigns at scale, serialise the cordon operation and insert verification checkpoints rather than parallelising.
patterns/risk-based-sequencing — roll the migration out in ascending order of risk so tooling is validated before it touches critical workloads.

Caveats¶

Not all numbers are disclosed. The post names a 5% FY2026 cost savings and a 5-10% FY2027 projection but does not enumerate compute cost base, workload mix, or FinOps methodology.
"80% reduction in manual operational overhead" is headline framing from the blog — no baseline measurement methodology disclosed.
No breakdown per workload class. The narrative names GPU / ARM / x86 + memory-intensive workloads but does not break down outcomes by workload type.
OPA PDB-validation policies are named but not shared. The specific rules enforced (minAvailable >= 2, required maxUnavailable bounds, etc.) are left as implementation detail.
The Karpenter transition tool / patching-check tool are not open-sourced as of publication. They are described but not linked.
Publication context. AWS Architecture Blog — the author list was truncated in the raw capture but the post sits in AWS's customer-architecture-case-study genre. Treat the success-metrics framing as AWS's voice, the operational lessons as Salesforce's substantive content.
Timeline is in-progress at publication. "With the Karpenter rollout still in progress…" — the FY2027 projection is a projection, not a measurement.

Source¶

systems/karpenter — the central system
systems/aws-eks — the platform
systems/cluster-autoscaler — the system being retired
systems/aws-auto-scaling-groups — the legacy capacity primitive
systems/open-policy-agent — PDB admission validator
concepts/bin-packing — Karpenter's core algorithm
concepts/scaling-latency — the central performance metric
concepts/pod-disruption-budget — the governance primitive
patterns/disruption-budget-guarded-upgrades — now multi-source canonical
patterns/automated-configuration-mapping — newly promoted
patterns/phased-migration-with-soak-times — newly promoted
patterns/rollback-capable-migration-tool — newly promoted
patterns/sequential-node-cordoning — newly promoted