CONCEPT Cited by 2 sources
Pod Disruption Budget¶
A Pod Disruption Budget (PDB) is a Kubernetes primitive that bounds the number or percentage of pods of a given workload that may be simultaneously terminated during a voluntary disruption — typically a node drain (cordon + delete) driven by cluster autoscaling, rolling upgrades, or manual maintenance.
A PDB is declarative:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-service-pdb
spec:
minAvailable: 2 # OR maxUnavailable: 1
selector:
matchLabels:
app: my-service
At drain time the Kubernetes eviction API consults PDBs: if evicting the next pod would violate the PDB, the eviction blocks until it's safe. Pods are drained serially respecting per-service budgets, instead of all at once.
Voluntary vs involuntary disruption¶
Important scope distinction:
- Voluntary (what PDBs govern) — node drain, cluster upgrade, eviction API call by a controller.
- Involuntary — hardware failure, kernel panic, zone outage, OOM kill. PDBs do not protect against these.
PDBs are a budget for planned churn, not for fault tolerance.
Node Disruption Budget (a sibling concept)¶
Node Disruption Budget (sometimes surfaced by cluster autoscalers / managed K8s services) is the analogous primitive at the node level: bound how many nodes in the cluster can be replaced concurrently, regardless of which pods are on them. In EKS Auto Mode, NDBs complement PDBs — PDBs say "don't lose all my replicas", NDBs say "don't churn half the cluster at once."
Why it becomes load-bearing under managed upgrades¶
When AWS or another managed-K8s platform owns the node-replacement schedule (e.g. EKS Auto Mode's weekly AMI cadence), PDBs + NDBs are the sole customer-side safety control that converts "the platform will actively terminate nodes" into "my service stays up during node replacement."
Generali explicitly called this out as a workflow adjustment driven by EKS Auto Mode:
"The team had to create disruption control configurations to prevent those disruptions from impacting workloads. For example, they specified a maintenance window during off-peak hours for those upgrades. They also specified Pod Disruption Budgets and Node Disruptions Budgets to make sure critical applications would not see all the pods of a micro-service being terminated at the same time." (Source: sources/2026-03-23-aws-generali-malaysia-eks-auto-mode)
This pairs (PDB + NDB + maintenance window) is the canonical customer contract under a managed-data-plane K8s service — see patterns/disruption-budget-guarded-upgrades.
Caveats and pitfalls¶
- PDBs can deadlock drain. Setting
minAvailableequal to the total replicas prevents eviction entirely; drains never progress. Budget sizing requires at least one pod of slack. - Multi-PDB coverage is additive. Multiple PDBs selecting the same pods stack restrictively — the most constraining wins.
- Doesn't protect single-replica workloads. A single pod cannot be drained under any non-zero PDB; such workloads must accept disruption or be scaled to >1.
Seen in¶
- sources/2026-03-23-aws-generali-malaysia-eks-auto-mode — Generali configures PDBs + Node Disruption Budgets + off-peak maintenance window as the customer-side safety contract under EKS Auto Mode's weekly node-replacement cadence.
- sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters — Salesforce's 1,000-cluster Karpenter migration surfaced PDBs as both a critical enabler and a common source of failure: several services had "overly restrictive or misconfigured PDBs that blocked node replacements". Remediation had three parts: auditing bad PDB configs, partnering with app owners on fixes, and installing OPA policies for proactive PDB validation at admission — treating PDBs as a governance primitive, not an app-team knob. Salesforce also flagged Karpenter consolidation vs singleton workloads as a distinct PDB-adjacent hazard — covered by guaranteed pod lifetime features and workload-aware disruption policies rather than PDBs alone.
Related¶
- systems/kubernetes
- systems/eks-auto-mode — the service that makes PDBs load- bearing rather than optional hygiene.
- systems/karpenter — the autoscaler whose bin-packing + consolidation campaigns use PDBs as the safety contract.
- systems/open-policy-agent — admission-layer enforcement point for PDB correctness (Salesforce).
- patterns/disruption-budget-guarded-upgrades — the compound pattern PDBs are a piece of.
- concepts/shared-responsibility-model — PDBs are what the customer retains when AWS takes over node churn.
- concepts/singleton-workload — PDBs can't protect these; separate mechanism required.