CONCEPT Cited by 1 source

WiredTiger cache¶

Definition¶

The WiredTiger cache is the in-memory buffer pool that MongoDB's WiredTiger storage engine uses to hold uncompressed data + index pages. It is the primary determinant of which queries hit memory vs. disk on a MongoDB server — effectively the working-set memory budget for the database.

Default sizing¶

MongoDB's WiredTiger documentation formula is "the larger of either 50 % of (RAM - 1 GB) or 256 MB." Concrete examples:

4 GB RAM → 1.5 GB cache (this is the hardware envelope in the MongoDB Cost of Not Knowing Part 3 load test).
8 GB RAM → 3.5 GB cache.
16 GB RAM → 7.5 GB cache.
64 GB RAM → 31.5 GB cache.

Only roughly half of RAM is allocated because the other half is left for OS filesystem cache, aggregation pipeline sorts / group temporary data, connections, replication buffers, and the rest of the MongoDB server's working memory.

Tunable via storage.wiredTiger.engineConfig.cacheSizeGB in mongod.conf or --wiredTigerCacheSizeGB.

Why it's load-bearing for schema design¶

WiredTiger stores each index in its own B-tree; pages are loaded into the cache on access and evicted on an LRU basis. When a query needs an index page that isn't cache-resident, it incurs a disk read (SSD: microseconds; HDD: milliseconds). At sustained load, queries whose index footprint exceeds the cache degrade sharply — the cache is thrashed, eviction rates rise, and every query tends toward the disk-latency baseline.

This makes index size vs cache size the single most important capacity-planning number for MongoDB workloads: an index that fits in cache and one that doesn't differ by orders of magnitude in steady-state latency, even if everything else (data layout, query pattern, write volume) is identical.

Canonical instance of the trap¶

The MongoDB Cost of Not Knowing Part 3 case study surfaces this as the new bottleneck after documents were shrunk:

appV6R0 (monthly-bucketed, dynamic schema) achieved 125 B average documents vs. appV5R3's 385 B — a 67.5 % shrink. But the approach also tripled the document count (95.3 M vs 33.4 M), and the _id index grew to 3.13 GB, exceeding the 4 GB-RAM machine's 1.5 GB WiredTiger cache allocation. The load test showed the expected disk-throughput win from smaller documents didn't materialize because index pages couldn't stay resident.
appV6R1 pivoted to quarterly-bucketing with the same dynamic-schema trick, dropping the _id index to 1.22 GB — back under the cache ceiling. Total per-event size fell from appV5R3's 28.1 B to 20.2 B; load-test throughput improved.

The general lesson: shrinking documents doesn't automatically help if the new bottleneck is index-in-cache. The dominant storage-on-disk dimension has to match the new performance bottleneck's dimension.

Cache-friendly schema heuristics¶

Drawn from the MongoDB case study and general practice:

Right-size bucket width to push index size under cache. Wider buckets = fewer documents = smaller _id index. Narrower buckets = smaller documents = better disk-throughput profile but larger index. Which wins depends on what's saturated.
Keep secondary-index cardinality in check. Each secondary index occupies its own cache budget. appV4+ in the case study uses a single _id index for exactly this reason — packing all query predicates into the _id keeps total index pressure tied strictly to document count.
Benchmark on representative data volumes. A 1 GB dev collection fits entirely in cache on any modern machine and gives no signal about production behaviour at 100 GB. The case study's 500 M-event / 4 GB-RAM rig is deliberately sized so cache pressure is visible at reasonable test durations.
Monitor wiredTiger.cache bytes currently in the cache vs maximum bytes configured, plus eviction statistics in serverStatus output. A rising eviction rate under steady workload is the canonical symptom of exceeding cache.

Seen in¶

sources/2025-10-09-mongodb-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4 — cache ceiling at 1.5 GB on 4 GB RAM is the specific resource appV6R0's 3.13 GB _id index overflowed, driving the pivot to appV6R1's quarter-bucket schema. Named in-article: "this is near the 4GB of available memory on the machine running the database and exceeds the 1.5GB allocated by WiredTiger for cache … the limiting factor in this case is memory/cache rather than document size."