Skip to content

SYSTEM Cited by 2 sources

Karpenter

Karpenter is an open-source (CNCF) Kubernetes node auto-scaler originally built by AWS. It watches pending pods, solves a bin-packing problem over configured instance types / zones / architectures, and dynamically provisions and de-provisions cloud instances directly — replacing the older Cluster Autoscaler model (which scales pre-defined node groups via ASGs) with per-instance control driven by actual pending pods.

Core design primitives

  • NodePool — declarative CRD that describes a pool of eligible nodes (limits, disruption policy, taints, labels, supported workloads).
  • EC2NodeClass — AWS-specific companion CRD describing the EC2-level details: allowed instance types, root volume size / IOPS / type / throughput, subnets, security groups, AMI family.
  • Bin-packing scheduler — picks the instance type + AZ + count that most efficiently absorbs the current set of pending pods.
  • Consolidation — continuously re-packs workloads onto fewer, larger nodes and retires the vacated ones; the inverse of scale-down-by-utilization-heuristic in CA.
  • Drift — detects when a running node's configuration has drifted from its NodePool / EC2NodeClass spec (e.g. AMI change) and replaces it.

Contrast with Cluster Autoscaler

The canonical wiki writeup is on the other page (systems/cluster-autoscaler). Summary:

Cluster Autoscaler Karpenter
Capacity primitive ASG (AWS) Direct EC2 RunInstances
Scaling latency Minutes Seconds
Instance diversity One template per ASG Many types per NodePool
AZ balance ASG-driven (poor) Scheduler-driven
Consolidation Utilization-heuristic Continuous bin-packing

What it solves

  • Flat over-provisioning of node pools that had to handle both steady-state load and deploy surges — expensive on nights and weekends.
  • Rigid node-group boundaries — Karpenter can span many instance types / zones / arch to match pod resource shapes in one NodePool.
  • Scale-down inefficiency of static fleets — continuous consolidation retires under-utilized nodes.
  • Subnet-pinned node pools — decoupling provisioning from specific subnets improves IP efficiency.
  • Poor AZ balance — the scheduler sees the whole cluster and picks under-represented AZs as it provisions.
  • Multi-minute scaling latency — pending-pod-driven provisioning collapses to seconds.

Pairs naturally with systems/keda or HPA for pod-level scaling: pods scale first, Karpenter scales nodes under them.

Known hazards (from production)

From Salesforce's 1,000-cluster migration (2026-01-12):

  • Bin-packing + consolidation can terminate singleton pods without warning. Mitigation: guaranteed-pod-lifetime features, workload-aware disruption policies, karpenter.sh/do-not-disrupt annotation.
  • PDB misconfigurations become migration-blocking. Overly restrictive or broken PDBs block node replacement. Fix upstream: audit + OPA-enforced PDB admission validation.
  • [[concepts/kubernetes-label-length-limit|63-character label limit]] can be unexpectedly breaking at scale because Karpenter's NodePool / EC2NodeClass matching is label-dependent — legacy human-friendly naming conventions often exceed the limit.
  • Ephemeral-storage defaults are not implicit. Moving from ASG to EC2NodeClass requires 1:1 volume-config translation; incomplete translation causes workloads to fail to schedule.
  • Parallel cordoning destabilizes clusters. Use sequential cordoning with verification checkpoints instead.

Scale references

  • Figma — scoped Karpenter into the ECS→EKS migration for cost savings; used with systems/keda for the pod layer (fast-follow).
  • Salesforce1,000+ EKS clusters / 1,180+ node pools / thousands of internal tenants. Canonical largest-known Karpenter production reference. Reports:
  • Scaling latency minutes → seconds.
  • 80% reduction in manual operational overhead.
  • 5% FY2026 cost savings, projected +5-10% FY2027.
  • Heterogeneous GPU / ARM / x86 in single node pools.
  • Eliminated thousands of node groups.
  • Datadog's State of Containers report (referenced in the 2026-01-12 post): +22% Karpenter-provisioned node share in the last 2 years across surveyed Kubernetes fleets.

Seen in

  • sources/2024-08-08-figma-migrated-onto-k8s-in-less-than-12-months — Figma scoped node-level auto-scaling (via Karpenter) into the ECS→EKS migration because cost savings justified the added scope for little extra work. Pod-level auto-scaling (Keda) was deferred to a fast-follow.
  • sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters — Salesforce's 1,000-cluster migration from Cluster Autoscaler + ASGs to Karpenter. Canonical wiki reference for Karpenter at extreme scale; documents the five operational lessons (PDB hygiene, sequential cordoning, 63-char labels, singleton protection, ephemeral-storage mapping), the three design principles of the in-house transition tool (zero-disruption + rollback + CI/CD-integrated), the automated ASG→Karpenter config mapping approach, and the rollout strategy (phased with soak times, risk-based sequencing).
Last updated · 200 distilled / 1,178 read