PATTERN Cited by 1 source

Fixed CPU pinning for latency-sensitive pool¶

Statement¶

For latency-sensitive connection pooler workloads (and, by extension, any low-latency network middleware), pin the pod to an exclusive physical CPU — don't let the scheduler float it across cores or land it on a sibling hyperthread of a busy core. On Kubernetes, this means running the pod as Guaranteed QoS with integer CPU requests and enabling the CPU Manager static policy (with full-pcpus-only for HT-aware allocation).

When to use it¶

Connection poolers (PgBouncer, Envoy in pooling roles, Odyssey, Pgpool-II).
Low-latency service proxies and API gateways.
Media / VoIP gateways and realtime protocol bridges.
Any single-threaded, event-loop process that dominates one core's worth of CPU and for which p99 latency matters.

Why it works¶

Pinning eliminates three latency sources:

Sibling-HT contention — softirq NET_RX/NET_TX handlers running alongside another network-heavy process on the sibling hyperthread suffer contention for shared microarchitectural resources. See concepts/hyperthread-softirq-contention.
CPU migration — the kernel's load balancer moves pods between cores, flushing cache and TLB.
Noisy-neighbour preemption — other pods on the same core take CPU time, inflating run-queue latency.

The combined effect in Zalando's PgBouncer benchmark: ~2× latency improvement (one isolated physical core vs. two HT-siblings sharing a core).

Implementation¶

Kubernetes path:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: pgbouncer
    resources:
      requests:  # integer CPU → Guaranteed QoS
        cpu: "2"
        memory: "128Mi"
      limits:
        cpu: "2"
        memory: "128Mi"

Plus kubelet flags on the node:

--cpu-manager-policy=static
--cpu-manager-policy-options=full-pcpus-only=true
--reserved-cpus=0-1   # reserve for kubelet/system

Non-Kubernetes path: taskset -c 4,5 pgbouncer ... or cgroup cpuset.cpus.

Trade-offs¶

Reduces node packing density — exclusive CPUs mean fewer pods per node.
Requires capacity planning — system daemons, kubelet, container runtime need reserved CPUs.
Not free in cloud cost: dedicated cores cost more per request than shared.
Breaks for bursty workloads — a pinned pod can't steal spare capacity from idle cores.

Seen in¶

sources/2020-06-23-zalando-pgbouncer-on-kubernetes-minimal-latency — Zalando's prescription for operators that can't tolerate the latency variability of the AZ-spread pooler deployment. Kukushkin: "In view of these results it could be beneficial to configure CPU manager in the cluster, so that this would not be an issue."