Skip to content

PATTERN Cited by 1 source

Fixed CPU pinning for latency-sensitive pool

Statement

For latency-sensitive connection pooler workloads (and, by extension, any low-latency network middleware), pin the pod to an exclusive physical CPU — don't let the scheduler float it across cores or land it on a sibling hyperthread of a busy core. On Kubernetes, this means running the pod as Guaranteed QoS with integer CPU requests and enabling the CPU Manager static policy (with full-pcpus-only for HT-aware allocation).

When to use it

  • Connection poolers (PgBouncer, Envoy in pooling roles, Odyssey, Pgpool-II).
  • Low-latency service proxies and API gateways.
  • Media / VoIP gateways and realtime protocol bridges.
  • Any single-threaded, event-loop process that dominates one core's worth of CPU and for which p99 latency matters.

Why it works

Pinning eliminates three latency sources:

  1. Sibling-HT contention — softirq NET_RX/NET_TX handlers running alongside another network-heavy process on the sibling hyperthread suffer contention for shared microarchitectural resources. See concepts/hyperthread-softirq-contention.
  2. CPU migration — the kernel's load balancer moves pods between cores, flushing cache and TLB.
  3. Noisy-neighbour preemption — other pods on the same core take CPU time, inflating run-queue latency.

The combined effect in Zalando's PgBouncer benchmark: ~2× latency improvement (one isolated physical core vs. two HT-siblings sharing a core).

Implementation

Kubernetes path:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: pgbouncer
    resources:
      requests:  # integer CPU → Guaranteed QoS
        cpu: "2"
        memory: "128Mi"
      limits:
        cpu: "2"
        memory: "128Mi"

Plus kubelet flags on the node:

--cpu-manager-policy=static
--cpu-manager-policy-options=full-pcpus-only=true
--reserved-cpus=0-1   # reserve for kubelet/system

Non-Kubernetes path: taskset -c 4,5 pgbouncer ... or cgroup cpuset.cpus.

Trade-offs

  • Reduces node packing density — exclusive CPUs mean fewer pods per node.
  • Requires capacity planning — system daemons, kubelet, container runtime need reserved CPUs.
  • Not free in cloud cost: dedicated cores cost more per request than shared.
  • Breaks for bursty workloads — a pinned pod can't steal spare capacity from idle cores.

Seen in

Last updated · 476 distilled / 1,218 read