CONCEPT Cited by 1 source

Hyperthread softirq contention¶

Definition¶

When two latency-sensitive processes are scheduled on sibling hyperthreads of the same physical CPU core, the kernel's softirq handlers — particularly NET_RX (vector 3) and NET_TX (vector 2) — run with measurably higher per-invocation latency than when the two processes sit on different physical cores. This translates directly into higher application-level p99 latency for network-heavy workloads.

The Linux kernel scaling docs make the recommendation explicit: "For interrupt handling, HT has shown no benefit in initial tests, so limit the number of queues to the number of CPU cores in the system."

Mechanism¶

Hyperthreads on a single physical core share execution units, L1/L2 caches, branch predictors, and TLB.
Softirq handlers are short, spiky, cache-sensitive — they touch the ring buffer, traverse socket bookkeeping, run protocol-stack code paths.
When a user-space process on the sibling thread is actively running (especially if it also touches network state), softirq handlers contend for the shared microarchitectural resources.
Net effect: the softirq handler takes longer to complete, packet delivery to user space is delayed, and application p99 latency grows.

Evidence in the wild¶

Zalando's PgBouncer experiment:

CPU placement	Observed latency
One PgBouncer on isolated physical core	Lowest
Two PgBouncers on sibling HTs of one physical core	~2× higher than baseline
Two PgBouncers on two separate physical cores	Middle (with modest noise from other HT)

Per-softirq latency measurement via irq:softirq_entry / irq:softirq_exit tracepoints confirmed higher 99th-percentile softirq latency in the shared-physical-core case — the root cause behind the application-level latency degradation.

Mitigations¶

Pin latency-sensitive processes to physical cores only, not to specific hyperthreads — use taskset or cgroup cpuset; on Kubernetes, the CPU Manager static policy handles this automatically.
Disable hyperthreading on the host — the brute-force option, trading throughput for latency consistency.
Align NIC queue count with physical-core count per the kernel doc recommendation; RSS/RPS should not create more queues than physical cores if interrupt latency matters.

Why it matters¶

For database connection poolers, VoIP / media gateways, high-frequency-trading gateways, and low-latency service meshes, the 2× p99 bump from landing on the wrong hyperthread can blow an SLO. The effect is invisible in average metrics — average throughput looks fine; only the tail of the latency distribution reveals the problem.

Seen in¶

sources/2020-06-23-zalando-pgbouncer-on-kubernetes-minimal-latency — first-person reproduction with perf record -e irq:softirq_entry,irq:softirq_exit and Brendan Gregg's latency extraction script. The article contains both the application-level (pgbench) and kernel-level (perf) evidence.

concepts/cpu-manager-static-policy — the Kubernetes-level mitigation.
concepts/so-reuseport-pgbouncer-scaling — the mechanism by which two PgBouncer processes end up on the same host, making the HT placement question relevant.