SYSTEM Cited by 1 source

Netflix global-lock-bench¶

global-lock-bench (github.com/Netflix/global-lock-bench) is an open-source microbenchmark Netflix built while diagnosing the Mount Mayhem incident. It spins up a configurable number of threads that contend on a single global lock, and reports latency vs thread count.

Purpose¶

Isolate the hardware-level signal from the application-level noise. When a workload is slow because it is contending on a kernel lock in a pathological way, trying to run the workload itself as a benchmark intermixes:

Kernel-level code path specifics (VFS, namespaces, page fault handlers, etc.).
Userspace orchestration (containerd, kubelet, systemd — each with its own tall stack).
The actual lock-contention physics at the CPU / cache / interconnect layer.

global-lock-bench strips the first two away and leaves just the third. Runs on any Linux host with user-space thread primitives.

Internal identifier: `pause_bench`¶

The post calls it "a small microbenchmark (pause_bench)" — named after the pause instruction the CPU executes while spinning on a contested atomic. Published repo name is global-lock-bench.

Seen in¶

Netflix Mount Mayhem — hardware-platform decision input (2026-02-28)¶

Netflix's use:

On r5.metal (2-socket 5th-gen Intel, multi-NUMA) — confirmed that eliminating NUMA (pinning to one socket) "significantly drops latency at high thread counts."
On m7i.metal-24xl (1-socket 7th-gen Intel) — disabling hyperthreading "further improves scaling" by 20–30%.
On m7a.24xlarge (1-socket 7th-gen AMD, chiplet) — "performance scales the best, demonstrating that a distributed cache architecture handles cache-line contention in this case of global locks more gracefully."

These three datapoints fed directly into the workload-to-architecture routing mitigation — and into Netflix's internal argument for which instance families to buy for container-heavy workloads going forward.

Source: sources/2026-02-28-netflix-mount-mayhem-at-netflix-scaling-containers-on-modern-cpus.

Why it generalises¶

Any Linux workload that ever touches a global-ish kernel lock (mount lock, rtnl_lock, dcache lock, mmap_sem / mmap_lock, etc.) can benefit from running global-lock-bench on the candidate hardware before committing. The benchmark doesn't need to model your workload — if the lock-contention physics are broken on the hardware, they're broken for your workload too.

Canonical wiki instance of patterns/contended-lock-microbenchmark.

patterns/contended-lock-microbenchmark — the generalised pattern
systems/intel-tma — used alongside to attribute the stall to false sharing
systems/netflix-titus — the platform whose production workload motivated the benchmark
concepts/false-sharing · concepts/cpu-mesh-interconnect · concepts/cpu-chiplet-architecture · concepts/hyperthreading-contention · concepts/numa-awareness — the four microarchitectural axes the benchmark explores