Skip to content

SYSTEM Cited by 1 source

Netflix global-lock-bench

global-lock-bench (github.com/Netflix/global-lock-bench) is an open-source microbenchmark Netflix built while diagnosing the Mount Mayhem incident. It spins up a configurable number of threads that contend on a single global lock, and reports latency vs thread count.

Purpose

Isolate the hardware-level signal from the application-level noise. When a workload is slow because it is contending on a kernel lock in a pathological way, trying to run the workload itself as a benchmark intermixes:

  • Kernel-level code path specifics (VFS, namespaces, page fault handlers, etc.).
  • Userspace orchestration (containerd, kubelet, systemd — each with its own tall stack).
  • The actual lock-contention physics at the CPU / cache / interconnect layer.

global-lock-bench strips the first two away and leaves just the third. Runs on any Linux host with user-space thread primitives.

Internal identifier: pause_bench

The post calls it "a small microbenchmark (pause_bench)" — named after the pause instruction the CPU executes while spinning on a contested atomic. Published repo name is global-lock-bench.

Seen in

Netflix Mount Mayhem — hardware-platform decision input (2026-02-28)

Netflix's use:

  • On r5.metal (2-socket 5th-gen Intel, multi-NUMA) — confirmed that eliminating NUMA (pinning to one socket) "significantly drops latency at high thread counts."
  • On m7i.metal-24xl (1-socket 7th-gen Intel) — disabling hyperthreading "further improves scaling" by 20–30%.
  • On m7a.24xlarge (1-socket 7th-gen AMD, chiplet) — "performance scales the best, demonstrating that a distributed cache architecture handles cache-line contention in this case of global locks more gracefully."

These three datapoints fed directly into the workload-to-architecture routing mitigation — and into Netflix's internal argument for which instance families to buy for container-heavy workloads going forward.

Source: sources/2026-02-28-netflix-mount-mayhem-at-netflix-scaling-containers-on-modern-cpus.

Why it generalises

Any Linux workload that ever touches a global-ish kernel lock (mount lock, rtnl_lock, dcache lock, mmap_sem / mmap_lock, etc.) can benefit from running global-lock-bench on the candidate hardware before committing. The benchmark doesn't need to model your workload — if the lock-contention physics are broken on the hardware, they're broken for your workload too.

Canonical wiki instance of patterns/contended-lock-microbenchmark.

Last updated · 319 distilled / 1,201 read