SYSTEM Cited by 2 sources

WiredTiger¶

Overview¶

WiredTiger is the B-tree-based storage engine MongoDB has used by default since MongoDB 3.2 (2015). Originally developed by the authors of Berkeley DB and acquired by MongoDB in 2014, it provides the on-disk format, the in-memory cache, MVCC (multi-version concurrency control) for document-level locking, per-collection compression, and journal / checkpoint durability for the MongoDB server.

WiredTiger is also separately documented as a TLA+-verified subject of MongoDB's VLDB 2025 "Design and Modular Verification of Distributed Transactions in MongoDB" paper (Schultz + Demirbas) — the storage interface is modelled and validated alongside the cross-shard transaction protocol.

Key components for schema-design trade-offs¶

WiredTiger cache — the in-memory buffer pool holding uncompressed pages. Sized at "the larger of 50 % of (RAM − 1 GB) or 256 MB" by default; configurable via storage.wiredTiger.engineConfig.cacheSizeGB. This is the practical working-set-memory budget for a MongoDB deployment. Exceeding it thrashes page eviction and collapses throughput onto disk-latency.
Block compressor (storage.wiredTiger.collectionConfig.blockCompressor) — per-collection compression algorithm for data pages. Options: snappy (default — fast, moderate ratio), zstd (higher ratio, more CPU), zlib (highest ratio, most CPU), none. See concepts/document-storage-compression.
Prefix compression on indexes — default on. Index keys share common prefixes within a B-tree page; WiredTiger stores the suffix + a length reference. Cache accounting is of uncompressed bytes, so prefix compression reduces disk footprint but not cache-fit.
Journal — group-commit WAL with a default 100 ms flush cadence. Provides the durability side of MongoDB's w: majority, j: true write concerns.
Checkpoints — every 60 seconds (default), WiredTiger snapshots modified pages from cache to the on-disk tree root. Crash recovery replays the journal from the last checkpoint.

Operational numbers from MongoDB case study¶

The MongoDB Cost of Not Knowing Part 3 load-test rig makes WiredTiger's sizing assumptions concrete:

4 GB RAM on the test machine.
1.5 GB WiredTiger cache allocation (per the default formula: max(0.5 * (4-1), 0.25) = 1.5 GB).
appV6R0's 3.13 GB _id index exceeded this cache budget and became the load-test-dominant bottleneck — "the limiting factor in this case is memory/cache rather than document size."
appV6R1's quarter-bucketing dropped the index to 1.22 GB, comfortably under the cache ceiling; throughput recovered.

Compressed-vs-uncompressed examples from the same article:

Revision	Data (uncompressed)	Storage (compressed)	Ratio
appV5R3	11.96 GB	3.24 GB	3.7×
appV6R1	8.19 GB	2.34 GB	3.5×
appV6R0	11.10 GB	3.33 GB	3.3×

All presumed snappy (WiredTiger's default); the article does not explicitly name the compressor.

Adjacent surfaces¶

Cache eviction metrics in db.serverStatus().wiredTiger.cache: bytes currently in the cache, maximum bytes configured, eviction rates. Rising evictions under steady workload ⇒ working set exceeds cache.
Checkpoint metrics in serverStatus.wiredTiger.checkpoint: checkpoint duration + frequency. Slow checkpoints typically signal write saturation.
Block-manager metrics: bytes read / written by the underlying storage-layer block manager; maps directly to disk-throughput consumption.

Seen in¶

sources/2026-02-27-mongodb-towards-model-based-verification-of-a-key-value-storage-engine — canonical wiki reference for WiredTiger as the target of model-based conformance checking. MongoDB's VLDB 2025 work extracts the interface boundary between the cross-shard transaction protocol and WiredTiger as a standalone Storage.tla spec (concepts/compositional-specification), enumerates its complete reachable state graph via a modified TLC, computes path coverings, and emits one test case per path as a sequence of WiredTiger API calls (patterns/test-case-generation-from-spec). Concrete result: 87,143 tests for a 2-key × 2-transaction finite model, generated + executed against WiredTiger in ~40 minutes. Specs + generator open-sourced at mongodb-labs/vldb25-dist-txns. Caveats: the current spec covers a subset of WiredTiger's API semantics; the finite model is intentionally tiny; full technical detail in the VLDB paper.
sources/2025-10-09-mongodb-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4 — WiredTiger's 1.5 GB cache on a 4 GB-RAM test machine is the specific budget the appV6R0 _id index overflowed, driving the pivot to quarter-bucketing in appV6R1. WiredTiger's default snappy compression (implicit throughout the article) is what produces the ~3.3-3.7× compression ratios observed across the appV5RX / appV6RX family's data vs storage sizes.
— WiredTiger as sibling B+tree storage engine to MySQL's InnoDB. PlanetScale's Ben Dicken names WiredTiger (with MongoDB, DynamoDB, Postgres) as a database that "relies on B-trees to perform efficient data lookups via indexes." Both WiredTiger and InnoDB are clustered B+tree engines — but at different database layers: WiredTiger is MongoDB's pluggable storage engine, InnoDB is MySQL's. Both store rows/documents inside the primary-key B+tree (concepts/clustered-index) with similar implications for primary-key choice — though MongoDB's default _id is a time-ordered ObjectId, which gives sequential PK locality by default (unlike UUIDv4).