Skip to content

PATTERN Cited by 1 source

Shared lock for read-only metadata

Intent

When a critical section in a hot path only reads metadata under a mutex, use a shared (read) lock — allowing multiple readers concurrently — rather than an exclusive (write) lock that serialises every read.

Sounds obvious. The pattern exists because it is routinely violated in real production code: a critical section originally written when the access pattern was mixed read-write evolves to be read-only in practice, but the lock kind is never re-evaluated. At low concurrency the bug is invisible; at high concurrency it becomes the dominant bottleneck.

Canonical instance: ClickHouse MergeTreeData parts mutex

Cloudflare's Ready-Analytics hit a query-planner slowdown after extending its partitioning key. Investigation showed "more than half of leaf query duration is spent waiting for a mutex that protects the table's list of active parts." (Source: sources/2026-05-14-cloudflare-clickhouse-query-plan-contention)

The mutex was used as an exclusive lock by every query planner. Cloudflare's diagnosis:

"The query planner doesn't modify the parts list; it just reads it. It had no business using an exclusive lock."

The fix:

"We modified the code to acquire a shared lock (std::shared_lock) instead. This allowed all query planners to enter the critical section concurrently."

Result: "A massive, immediate drop in query duration. The lock contention vanished."

Optimization 1 of the three-patch fix stack; merged upstream as part of ClickHouse PR #85535 in ClickHouse 25.11.

Why it gets missed

Three structural reasons this pattern is widely violated:

  1. Lock kind is set at code-write time, access pattern is set at use time. The original MergeTreeData mutex was probably written when the access pattern was mixed (planner reads + insert / merge writers all using one mutex) and a single exclusive lock was simpler than audit-driven separation. As the codebase evolved and the planner became read-only, no one revisited the lock kind.
  2. CPU profiles don't show the contention. The standard diagnostic for "this code is slow" is a CPU flame graph, which shows where active threads are spending CPU. The threads waiting on the mutex aren't running, so they don't appear. See concepts/cpu-vs-real-flame-graph — the diagnostic flip is what made Cloudflare's lock-contention bottleneck visible.
  3. Per-query metrics look fine. Throughput, rows scanned, bytes read — all unchanged. The query is doing the same work; it's just waiting longer to start. Dashboards built around per-query resource counters miss the contention entirely.

When the pattern applies

The fix is correct iff the critical section is genuinely read-only with respect to the protected state. Verifying this requires:

  • Auditing every code path that takes the lock to confirm none mutate the protected data structure.
  • Checking that the protected data structure's destructors, iterators, and any lazy-initialisation paths are safe under concurrent reads.
  • Ensuring writer paths (the small minority that do mutate) upgrade to an exclusive lock — readers and writers are mutually exclusive; only readers parallelise.
  • Confirming the language / runtime's shared-lock implementation has acceptable read-side overhead vs. the contention it relieves. (In C++ std::shared_mutex is somewhat heavier than std::mutex per acquisition; on uncontended workloads the cost is real but small. On contended workloads the parallelism benefit dominates.)

Substrate variants

Substrate Read lock Write lock
C++17 std::shared_lock<std::shared_mutex> std::unique_lock<std::shared_mutex>
Rust RwLock::read() RwLock::write()
Java ReentrantReadWriteLock.readLock() .writeLock()
Python No stdlib RW lock; libraries (e.g., readerwriterlock) same
Go sync.RWMutex.RLock() .Lock()
InnoDB row level SELECT ... FOR SHARE SELECT ... FOR UPDATE / implicit on UPDATE/DELETE
Postgres advisory pg_advisory_lock_shared pg_advisory_lock

The pattern transfers cleanly across substrates because the underlying primitive (the readers-writer lock) is universally available.

Trade-offs

  • Read-side cost: shared-lock acquisitions are slightly heavier than plain mutex acquisitions on most platforms. At low contention, replacing a std::mutex with a std::shared_mutex can be a marginal regression. The pattern is correct iff the contention reduction outweighs the per-acquisition overhead — true in any high-contention read-heavy critical section.
  • Writer starvation risk: with continuous read traffic, writers can be starved indefinitely on naive RW lock implementations. Most modern implementations (std:: shared_mutex, Java ReentrantReadWriteLock with fairness flag) support writer-priority modes; if writers are occasionally important, configure fairness explicitly.
  • Hidden mutating callees: a critical section that appears read-only may invoke functions that mutate shared state under the hood (lazy caching, statistics updates, reference-counted resource tracking). Mutation inside a shared-lock critical section is a correctness bug, not just a performance one. Audit carefully.
  • Often paired with [[patterns/deferred-copy-cached- collection]] — switching to shared lock removes the serialising bottleneck but per-reader copy-of-the- collection can still dominate. The deferred-copy pattern eliminates the copy. Cloudflare's Optimizations 1 + 2 ship together precisely because they compose.
  • Sometimes paired with patterns/binary-search-on-sorted-partition-prefix — once concurrent reads are unblocked and copies eliminated, residual per-read scan cost can be reduced further by exploiting structural ordering of the data. Cloudflare's Optimization 3.
  • concepts/shared-lock-vs-exclusive-lock — the pedagogical primitive. The wiki's concept page frames shared vs. exclusive locks at the InnoDB row-locking altitude; this pattern transfers the same framing to in-memory metadata locks in a database server's runtime.

Adjacent in the literature

  • Read-Copy-Update (RCU) — Linux kernel primitive that takes the read-side fast-path further than RW locks (zero-cost reads, deferred reclamation). The shared-lock pattern is a strict subset of RCU's read-side discipline; RCU is what you reach for when even shared-lock acquisition cost is unacceptable.
  • epoch-based reclamation in concurrent data structures (Crossbeam in Rust, folly::Hazptr in C++) — same family.

Seen in

  • sources/2026-05-14-cloudflare-clickhouse-query-plan-contention — canonical wiki instance. ClickHouse MergeTreeData parts mutex used exclusively by query planners that only read the parts list; switching to std::shared_lock delivered the immediate drop in query duration that resolved Cloudflare's billing-pipeline slowdown. Optimization 1 in the three-patch stack; upstream as PR #85535. The post explicitly names "It had no business using an exclusive lock" as the diagnosis, which makes this the canonical wiki instance of the exclusive-where-shared-suffices mistake at the OLAP- database planner altitude.
Last updated · 542 distilled / 1,571 read