SYSTEM Cited by 1 source
Netflix global-lock-bench¶
global-lock-bench (github.com/Netflix/global-lock-bench) is an open-source microbenchmark Netflix built while diagnosing the Mount Mayhem incident. It spins up a configurable number of threads that contend on a single global lock, and reports latency vs thread count.
Purpose¶
Isolate the hardware-level signal from the application-level noise. When a workload is slow because it is contending on a kernel lock in a pathological way, trying to run the workload itself as a benchmark intermixes:
- Kernel-level code path specifics (VFS, namespaces, page fault handlers, etc.).
- Userspace orchestration (containerd, kubelet, systemd — each with its own tall stack).
- The actual lock-contention physics at the CPU / cache / interconnect layer.
global-lock-bench strips the first two away and leaves just the third. Runs on any Linux host with user-space thread primitives.
Internal identifier: pause_bench¶
The post calls it "a small microbenchmark (pause_bench)" — named after the pause instruction the CPU executes while spinning on a contested atomic. Published repo name is global-lock-bench.
Seen in¶
Netflix Mount Mayhem — hardware-platform decision input (2026-02-28)¶
Netflix's use:
- On r5.metal (2-socket 5th-gen Intel, multi-NUMA) — confirmed that eliminating NUMA (pinning to one socket) "significantly drops latency at high thread counts."
- On m7i.metal-24xl (1-socket 7th-gen Intel) — disabling hyperthreading "further improves scaling" by 20–30%.
- On m7a.24xlarge (1-socket 7th-gen AMD, chiplet) — "performance scales the best, demonstrating that a distributed cache architecture handles cache-line contention in this case of global locks more gracefully."
These three datapoints fed directly into the workload-to-architecture routing mitigation — and into Netflix's internal argument for which instance families to buy for container-heavy workloads going forward.
Source: sources/2026-02-28-netflix-mount-mayhem-at-netflix-scaling-containers-on-modern-cpus.
Why it generalises¶
Any Linux workload that ever touches a global-ish kernel lock (mount lock, rtnl_lock, dcache lock, mmap_sem / mmap_lock, etc.) can benefit from running global-lock-bench on the candidate hardware before committing. The benchmark doesn't need to model your workload — if the lock-contention physics are broken on the hardware, they're broken for your workload too.
Canonical wiki instance of patterns/contended-lock-microbenchmark.
Related¶
- patterns/contended-lock-microbenchmark — the generalised pattern
- systems/intel-tma — used alongside to attribute the stall to false sharing
- systems/netflix-titus — the platform whose production workload motivated the benchmark
- concepts/false-sharing · concepts/cpu-mesh-interconnect · concepts/cpu-chiplet-architecture · concepts/hyperthreading-contention · concepts/numa-awareness — the four microarchitectural axes the benchmark explores