Skip to content

CONCEPT Cited by 1 source

Hyperthread softirq contention

Definition

When two latency-sensitive processes are scheduled on sibling hyperthreads of the same physical CPU core, the kernel's softirq handlers — particularly NET_RX (vector 3) and NET_TX (vector 2) — run with measurably higher per-invocation latency than when the two processes sit on different physical cores. This translates directly into higher application-level p99 latency for network-heavy workloads.

The Linux kernel scaling docs make the recommendation explicit: "For interrupt handling, HT has shown no benefit in initial tests, so limit the number of queues to the number of CPU cores in the system."

Mechanism

  • Hyperthreads on a single physical core share execution units, L1/L2 caches, branch predictors, and TLB.
  • Softirq handlers are short, spiky, cache-sensitive — they touch the ring buffer, traverse socket bookkeeping, run protocol-stack code paths.
  • When a user-space process on the sibling thread is actively running (especially if it also touches network state), softirq handlers contend for the shared microarchitectural resources.
  • Net effect: the softirq handler takes longer to complete, packet delivery to user space is delayed, and application p99 latency grows.

Evidence in the wild

Zalando's PgBouncer experiment:

CPU placement Observed latency
One PgBouncer on isolated physical core Lowest
Two PgBouncers on sibling HTs of one physical core ~2× higher than baseline
Two PgBouncers on two separate physical cores Middle (with modest noise from other HT)

Per-softirq latency measurement via irq:softirq_entry / irq:softirq_exit tracepoints confirmed higher 99th-percentile softirq latency in the shared-physical-core case — the root cause behind the application-level latency degradation.

Mitigations

  • Pin latency-sensitive processes to physical cores only, not to specific hyperthreads — use taskset or cgroup cpuset; on Kubernetes, the CPU Manager static policy handles this automatically.
  • Disable hyperthreading on the host — the brute-force option, trading throughput for latency consistency.
  • Align NIC queue count with physical-core count per the kernel doc recommendation; RSS/RPS should not create more queues than physical cores if interrupt latency matters.

Why it matters

For database connection poolers, VoIP / media gateways, high-frequency-trading gateways, and low-latency service meshes, the 2× p99 bump from landing on the wrong hyperthread can blow an SLO. The effect is invisible in average metrics — average throughput looks fine; only the tail of the latency distribution reveals the problem.

Seen in

Last updated · 476 distilled / 1,218 read