SYSTEM Cited by 2 sources
Karpenter¶
Karpenter is an open-source (CNCF) Kubernetes node auto-scaler originally built by AWS. It watches pending pods, solves a bin-packing problem over configured instance types / zones / architectures, and dynamically provisions and de-provisions cloud instances directly — replacing the older Cluster Autoscaler model (which scales pre-defined node groups via ASGs) with per-instance control driven by actual pending pods.
Core design primitives¶
NodePool— declarative CRD that describes a pool of eligible nodes (limits, disruption policy, taints, labels, supported workloads).EC2NodeClass— AWS-specific companion CRD describing the EC2-level details: allowed instance types, root volume size / IOPS / type / throughput, subnets, security groups, AMI family.- Bin-packing scheduler — picks the instance type + AZ + count that most efficiently absorbs the current set of pending pods.
- Consolidation — continuously re-packs workloads onto fewer, larger nodes and retires the vacated ones; the inverse of scale-down-by-utilization-heuristic in CA.
- Drift — detects when a running node's configuration has
drifted from its
NodePool/EC2NodeClassspec (e.g. AMI change) and replaces it.
Contrast with Cluster Autoscaler¶
The canonical wiki writeup is on the other page (systems/cluster-autoscaler). Summary:
| Cluster Autoscaler | Karpenter | |
|---|---|---|
| Capacity primitive | ASG (AWS) | Direct EC2 RunInstances |
| Scaling latency | Minutes | Seconds |
| Instance diversity | One template per ASG | Many types per NodePool |
| AZ balance | ASG-driven (poor) | Scheduler-driven |
| Consolidation | Utilization-heuristic | Continuous bin-packing |
What it solves¶
- Flat over-provisioning of node pools that had to handle both steady-state load and deploy surges — expensive on nights and weekends.
- Rigid node-group boundaries — Karpenter can span many
instance types / zones / arch to match pod resource shapes in
one
NodePool. - Scale-down inefficiency of static fleets — continuous consolidation retires under-utilized nodes.
- Subnet-pinned node pools — decoupling provisioning from specific subnets improves IP efficiency.
- Poor AZ balance — the scheduler sees the whole cluster and picks under-represented AZs as it provisions.
- Multi-minute scaling latency — pending-pod-driven provisioning collapses to seconds.
Pairs naturally with systems/keda or HPA for pod-level scaling: pods scale first, Karpenter scales nodes under them.
Known hazards (from production)¶
From Salesforce's 1,000-cluster migration (2026-01-12):
- Bin-packing + consolidation can terminate
singleton pods without
warning. Mitigation: guaranteed-pod-lifetime features,
workload-aware disruption policies,
karpenter.sh/do-not-disruptannotation. - PDB misconfigurations become migration-blocking. Overly restrictive or broken PDBs block node replacement. Fix upstream: audit + OPA-enforced PDB admission validation.
- [[concepts/kubernetes-label-length-limit|63-character label
limit]] can be unexpectedly breaking at scale because Karpenter's
NodePool/EC2NodeClassmatching is label-dependent — legacy human-friendly naming conventions often exceed the limit. - Ephemeral-storage defaults are not implicit. Moving from ASG
to
EC2NodeClassrequires 1:1 volume-config translation; incomplete translation causes workloads to fail to schedule. - Parallel cordoning destabilizes clusters. Use sequential cordoning with verification checkpoints instead.
Scale references¶
- Figma — scoped Karpenter into the ECS→EKS migration for cost savings; used with systems/keda for the pod layer (fast-follow).
- Salesforce — 1,000+ EKS clusters / 1,180+ node pools / thousands of internal tenants. Canonical largest-known Karpenter production reference. Reports:
- Scaling latency minutes → seconds.
- 80% reduction in manual operational overhead.
- 5% FY2026 cost savings, projected +5-10% FY2027.
- Heterogeneous GPU / ARM / x86 in single node pools.
- Eliminated thousands of node groups.
- Datadog's State of Containers report (referenced in the 2026-01-12 post): +22% Karpenter-provisioned node share in the last 2 years across surveyed Kubernetes fleets.
Seen in¶
- sources/2024-08-08-figma-migrated-onto-k8s-in-less-than-12-months — Figma scoped node-level auto-scaling (via Karpenter) into the ECS→EKS migration because cost savings justified the added scope for little extra work. Pod-level auto-scaling (Keda) was deferred to a fast-follow.
- sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters — Salesforce's 1,000-cluster migration from Cluster Autoscaler + ASGs to Karpenter. Canonical wiki reference for Karpenter at extreme scale; documents the five operational lessons (PDB hygiene, sequential cordoning, 63-char labels, singleton protection, ephemeral-storage mapping), the three design principles of the in-house transition tool (zero-disruption + rollback + CI/CD-integrated), the automated ASG→Karpenter config mapping approach, and the rollout strategy (phased with soak times, risk-based sequencing).
Related¶
- systems/cluster-autoscaler — the autoscaler Karpenter is displacing.
- systems/aws-auto-scaling-groups — the legacy capacity primitive Karpenter bypasses.
- systems/aws-eks — the typical runtime.
- systems/kubernetes — the orchestrator it scales.
- systems/aws-ec2 — the cloud compute under it.
- systems/keda — the usual pod-layer companion.
- concepts/bin-packing — Karpenter's core algorithm.
- concepts/scaling-latency — the metric Karpenter wins on.
- concepts/pod-disruption-budget — Karpenter's safety contract during consolidation.
- concepts/singleton-workload — the workload class Karpenter's consolidation can harm.
- concepts/availability-zone-balance / concepts/ip-address-fragmentation — two platform-scale wins surfaced at Salesforce.
- concepts/kubernetes-label-length-limit — migration-blocker at Salesforce scale.
- patterns/disruption-budget-guarded-upgrades — the compound safety pattern Karpenter depends on customers to configure.
- patterns/sequential-node-cordoning — Salesforce's operational lesson for node-replacement campaigns.