Skip to content

CONCEPT Cited by 2 sources

Pod Disruption Budget

A Pod Disruption Budget (PDB) is a Kubernetes primitive that bounds the number or percentage of pods of a given workload that may be simultaneously terminated during a voluntary disruption — typically a node drain (cordon + delete) driven by cluster autoscaling, rolling upgrades, or manual maintenance.

A PDB is declarative:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-service-pdb
spec:
  minAvailable: 2      # OR maxUnavailable: 1
  selector:
    matchLabels:
      app: my-service

At drain time the Kubernetes eviction API consults PDBs: if evicting the next pod would violate the PDB, the eviction blocks until it's safe. Pods are drained serially respecting per-service budgets, instead of all at once.

Voluntary vs involuntary disruption

Important scope distinction:

  • Voluntary (what PDBs govern) — node drain, cluster upgrade, eviction API call by a controller.
  • Involuntary — hardware failure, kernel panic, zone outage, OOM kill. PDBs do not protect against these.

PDBs are a budget for planned churn, not for fault tolerance.

Node Disruption Budget (a sibling concept)

Node Disruption Budget (sometimes surfaced by cluster autoscalers / managed K8s services) is the analogous primitive at the node level: bound how many nodes in the cluster can be replaced concurrently, regardless of which pods are on them. In EKS Auto Mode, NDBs complement PDBs — PDBs say "don't lose all my replicas", NDBs say "don't churn half the cluster at once."

Why it becomes load-bearing under managed upgrades

When AWS or another managed-K8s platform owns the node-replacement schedule (e.g. EKS Auto Mode's weekly AMI cadence), PDBs + NDBs are the sole customer-side safety control that converts "the platform will actively terminate nodes" into "my service stays up during node replacement."

Generali explicitly called this out as a workflow adjustment driven by EKS Auto Mode:

"The team had to create disruption control configurations to prevent those disruptions from impacting workloads. For example, they specified a maintenance window during off-peak hours for those upgrades. They also specified Pod Disruption Budgets and Node Disruptions Budgets to make sure critical applications would not see all the pods of a micro-service being terminated at the same time." (Source: sources/2026-03-23-aws-generali-malaysia-eks-auto-mode)

This pairs (PDB + NDB + maintenance window) is the canonical customer contract under a managed-data-plane K8s service — see patterns/disruption-budget-guarded-upgrades.

Caveats and pitfalls

  • PDBs can deadlock drain. Setting minAvailable equal to the total replicas prevents eviction entirely; drains never progress. Budget sizing requires at least one pod of slack.
  • Multi-PDB coverage is additive. Multiple PDBs selecting the same pods stack restrictively — the most constraining wins.
  • Doesn't protect single-replica workloads. A single pod cannot be drained under any non-zero PDB; such workloads must accept disruption or be scaled to >1.

Seen in

  • sources/2026-03-23-aws-generali-malaysia-eks-auto-mode — Generali configures PDBs + Node Disruption Budgets + off-peak maintenance window as the customer-side safety contract under EKS Auto Mode's weekly node-replacement cadence.
  • sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters — Salesforce's 1,000-cluster Karpenter migration surfaced PDBs as both a critical enabler and a common source of failure: several services had "overly restrictive or misconfigured PDBs that blocked node replacements". Remediation had three parts: auditing bad PDB configs, partnering with app owners on fixes, and installing OPA policies for proactive PDB validation at admission — treating PDBs as a governance primitive, not an app-team knob. Salesforce also flagged Karpenter consolidation vs singleton workloads as a distinct PDB-adjacent hazard — covered by guaranteed pod lifetime features and workload-aware disruption policies rather than PDBs alone.
Last updated · 200 distilled / 1,178 read