PATTERN Cited by 1 source
Fixed CPU pinning for latency-sensitive pool¶
Statement¶
For latency-sensitive connection pooler workloads (and,
by extension, any low-latency network middleware), pin the
pod to an exclusive physical CPU — don't let the scheduler
float it across cores or land it on a sibling hyperthread of
a busy core. On Kubernetes, this means running the pod as
Guaranteed QoS with integer CPU requests and enabling the
CPU Manager static
policy (with full-pcpus-only for HT-aware allocation).
When to use it¶
- Connection poolers (PgBouncer, Envoy in pooling roles, Odyssey, Pgpool-II).
- Low-latency service proxies and API gateways.
- Media / VoIP gateways and realtime protocol bridges.
- Any single-threaded, event-loop process that dominates one core's worth of CPU and for which p99 latency matters.
Why it works¶
Pinning eliminates three latency sources:
- Sibling-HT contention — softirq NET_RX/NET_TX handlers running alongside another network-heavy process on the sibling hyperthread suffer contention for shared microarchitectural resources. See concepts/hyperthread-softirq-contention.
- CPU migration — the kernel's load balancer moves pods between cores, flushing cache and TLB.
- Noisy-neighbour preemption — other pods on the same core take CPU time, inflating run-queue latency.
The combined effect in Zalando's PgBouncer benchmark: ~2× latency improvement (one isolated physical core vs. two HT-siblings sharing a core).
Implementation¶
Kubernetes path:
apiVersion: v1
kind: Pod
spec:
containers:
- name: pgbouncer
resources:
requests: # integer CPU → Guaranteed QoS
cpu: "2"
memory: "128Mi"
limits:
cpu: "2"
memory: "128Mi"
Plus kubelet flags on the node:
--cpu-manager-policy=static
--cpu-manager-policy-options=full-pcpus-only=true
--reserved-cpus=0-1 # reserve for kubelet/system
Non-Kubernetes path: taskset -c 4,5 pgbouncer ... or
cgroup cpuset.cpus.
Trade-offs¶
- Reduces node packing density — exclusive CPUs mean fewer pods per node.
- Requires capacity planning — system daemons, kubelet, container runtime need reserved CPUs.
- Not free in cloud cost: dedicated cores cost more per request than shared.
- Breaks for bursty workloads — a pinned pod can't steal spare capacity from idle cores.
Seen in¶
- sources/2020-06-23-zalando-pgbouncer-on-kubernetes-minimal-latency — Zalando's prescription for operators that can't tolerate the latency variability of the AZ-spread pooler deployment. Kukushkin: "In view of these results it could be beneficial to configure CPU manager in the cluster, so that this would not be an issue."