PATTERN Cited by 1 source
Shared lock for read-only metadata¶
Intent¶
When a critical section in a hot path only reads metadata under a mutex, use a shared (read) lock — allowing multiple readers concurrently — rather than an exclusive (write) lock that serialises every read.
Sounds obvious. The pattern exists because it is routinely violated in real production code: a critical section originally written when the access pattern was mixed read-write evolves to be read-only in practice, but the lock kind is never re-evaluated. At low concurrency the bug is invisible; at high concurrency it becomes the dominant bottleneck.
Canonical instance: ClickHouse MergeTreeData parts mutex¶
Cloudflare's Ready-Analytics hit a query-planner slowdown after extending its partitioning key. Investigation showed "more than half of leaf query duration is spent waiting for a mutex that protects the table's list of active parts." (Source: sources/2026-05-14-cloudflare-clickhouse-query-plan-contention)
The mutex was used as an exclusive lock by every query planner. Cloudflare's diagnosis:
"The query planner doesn't modify the parts list; it just reads it. It had no business using an exclusive lock."
The fix:
"We modified the code to acquire a shared lock (
std::shared_lock) instead. This allowed all query planners to enter the critical section concurrently."
Result: "A massive, immediate drop in query duration. The lock contention vanished."
Optimization 1 of the three-patch fix stack; merged upstream as part of ClickHouse PR #85535 in ClickHouse 25.11.
Why it gets missed¶
Three structural reasons this pattern is widely violated:
- Lock kind is set at code-write time, access pattern is
set at use time. The original
MergeTreeDatamutex was probably written when the access pattern was mixed (planner reads + insert / merge writers all using one mutex) and a single exclusive lock was simpler than audit-driven separation. As the codebase evolved and the planner became read-only, no one revisited the lock kind. - CPU profiles don't show the contention. The standard diagnostic for "this code is slow" is a CPU flame graph, which shows where active threads are spending CPU. The threads waiting on the mutex aren't running, so they don't appear. See concepts/cpu-vs-real-flame-graph — the diagnostic flip is what made Cloudflare's lock-contention bottleneck visible.
- Per-query metrics look fine. Throughput, rows scanned, bytes read — all unchanged. The query is doing the same work; it's just waiting longer to start. Dashboards built around per-query resource counters miss the contention entirely.
When the pattern applies¶
The fix is correct iff the critical section is genuinely read-only with respect to the protected state. Verifying this requires:
- Auditing every code path that takes the lock to confirm none mutate the protected data structure.
- Checking that the protected data structure's destructors, iterators, and any lazy-initialisation paths are safe under concurrent reads.
- Ensuring writer paths (the small minority that do mutate) upgrade to an exclusive lock — readers and writers are mutually exclusive; only readers parallelise.
- Confirming the language / runtime's shared-lock
implementation has acceptable read-side overhead vs. the
contention it relieves. (In C++
std::shared_mutexis somewhat heavier thanstd::mutexper acquisition; on uncontended workloads the cost is real but small. On contended workloads the parallelism benefit dominates.)
Substrate variants¶
| Substrate | Read lock | Write lock |
|---|---|---|
| C++17 | std::shared_lock<std::shared_mutex> |
std::unique_lock<std::shared_mutex> |
| Rust | RwLock::read() |
RwLock::write() |
| Java | ReentrantReadWriteLock.readLock() |
.writeLock() |
| Python | No stdlib RW lock; libraries (e.g., readerwriterlock) |
same |
| Go | sync.RWMutex.RLock() |
.Lock() |
| InnoDB row level | SELECT ... FOR SHARE |
SELECT ... FOR UPDATE / implicit on UPDATE/DELETE |
| Postgres advisory | pg_advisory_lock_shared |
pg_advisory_lock |
The pattern transfers cleanly across substrates because the underlying primitive (the readers-writer lock) is universally available.
Trade-offs¶
- Read-side cost: shared-lock acquisitions are slightly
heavier than plain mutex acquisitions on most platforms.
At low contention, replacing a
std::mutexwith astd::shared_mutexcan be a marginal regression. The pattern is correct iff the contention reduction outweighs the per-acquisition overhead — true in any high-contention read-heavy critical section. - Writer starvation risk: with continuous read traffic,
writers can be starved indefinitely on naive RW lock
implementations. Most modern implementations (
std:: shared_mutex, JavaReentrantReadWriteLockwith fairness flag) support writer-priority modes; if writers are occasionally important, configure fairness explicitly. - Hidden mutating callees: a critical section that appears read-only may invoke functions that mutate shared state under the hood (lazy caching, statistics updates, reference-counted resource tracking). Mutation inside a shared-lock critical section is a correctness bug, not just a performance one. Audit carefully.
Related patterns and concepts¶
- Often paired with [[patterns/deferred-copy-cached- collection]] — switching to shared lock removes the serialising bottleneck but per-reader copy-of-the- collection can still dominate. The deferred-copy pattern eliminates the copy. Cloudflare's Optimizations 1 + 2 ship together precisely because they compose.
- Sometimes paired with patterns/binary-search-on-sorted-partition-prefix — once concurrent reads are unblocked and copies eliminated, residual per-read scan cost can be reduced further by exploiting structural ordering of the data. Cloudflare's Optimization 3.
- concepts/shared-lock-vs-exclusive-lock — the pedagogical primitive. The wiki's concept page frames shared vs. exclusive locks at the InnoDB row-locking altitude; this pattern transfers the same framing to in-memory metadata locks in a database server's runtime.
Adjacent in the literature¶
- Read-Copy-Update (RCU) — Linux kernel primitive that takes the read-side fast-path further than RW locks (zero-cost reads, deferred reclamation). The shared-lock pattern is a strict subset of RCU's read-side discipline; RCU is what you reach for when even shared-lock acquisition cost is unacceptable.
epoch-based reclamationin concurrent data structures (Crossbeam in Rust,folly::Hazptrin C++) — same family.
Seen in¶
- sources/2026-05-14-cloudflare-clickhouse-query-plan-contention
— canonical wiki instance. ClickHouse
MergeTreeDataparts mutex used exclusively by query planners that only read the parts list; switching tostd::shared_lockdelivered the immediate drop in query duration that resolved Cloudflare's billing-pipeline slowdown. Optimization 1 in the three-patch stack; upstream as PR #85535. The post explicitly names "It had no business using an exclusive lock" as the diagnosis, which makes this the canonical wiki instance of the exclusive-where-shared-suffices mistake at the OLAP- database planner altitude.
Related¶
- concepts/shared-lock-vs-exclusive-lock
- concepts/lock-contention-in-query-planning
- concepts/cpu-vs-real-flame-graph
- concepts/clickhouse-data-part
- systems/clickhouse
- systems/cloudflare-ready-analytics
- patterns/deferred-copy-cached-collection
- patterns/binary-search-on-sorted-partition-prefix
- patterns/upstream-the-fix