Skip to content

SYSTEM Cited by 2 sources

WiredTiger

Overview

WiredTiger is the B-tree-based storage engine MongoDB has used by default since MongoDB 3.2 (2015). Originally developed by the authors of Berkeley DB and acquired by MongoDB in 2014, it provides the on-disk format, the in-memory cache, MVCC (multi-version concurrency control) for document-level locking, per-collection compression, and journal / checkpoint durability for the MongoDB server.

WiredTiger is also separately documented as a TLA+-verified subject of MongoDB's VLDB 2025 "Design and Modular Verification of Distributed Transactions in MongoDB" paper (Schultz + Demirbas) — the storage interface is modelled and validated alongside the cross-shard transaction protocol.

Key components for schema-design trade-offs

  • WiredTiger cache — the in-memory buffer pool holding uncompressed pages. Sized at "the larger of 50 % of (RAM − 1 GB) or 256 MB" by default; configurable via storage.wiredTiger.engineConfig.cacheSizeGB. This is the practical working-set-memory budget for a MongoDB deployment. Exceeding it thrashes page eviction and collapses throughput onto disk-latency.
  • Block compressor (storage.wiredTiger.collectionConfig.blockCompressor) — per-collection compression algorithm for data pages. Options: snappy (default — fast, moderate ratio), zstd (higher ratio, more CPU), zlib (highest ratio, most CPU), none. See concepts/document-storage-compression.
  • Prefix compression on indexes — default on. Index keys share common prefixes within a B-tree page; WiredTiger stores the suffix + a length reference. Cache accounting is of uncompressed bytes, so prefix compression reduces disk footprint but not cache-fit.
  • Journal — group-commit WAL with a default 100 ms flush cadence. Provides the durability side of MongoDB's w: majority, j: true write concerns.
  • Checkpoints — every 60 seconds (default), WiredTiger snapshots modified pages from cache to the on-disk tree root. Crash recovery replays the journal from the last checkpoint.

Operational numbers from MongoDB case study

The MongoDB Cost of Not Knowing Part 3 load-test rig makes WiredTiger's sizing assumptions concrete:

  • 4 GB RAM on the test machine.
  • 1.5 GB WiredTiger cache allocation (per the default formula: max(0.5 * (4-1), 0.25) = 1.5 GB).
  • appV6R0's 3.13 GB _id index exceeded this cache budget and became the load-test-dominant bottleneck — "the limiting factor in this case is memory/cache rather than document size."
  • appV6R1's quarter-bucketing dropped the index to 1.22 GB, comfortably under the cache ceiling; throughput recovered.

Compressed-vs-uncompressed examples from the same article:

Revision Data (uncompressed) Storage (compressed) Ratio
appV5R3 11.96 GB 3.24 GB 3.7×
appV6R1 8.19 GB 2.34 GB 3.5×
appV6R0 11.10 GB 3.33 GB 3.3×

All presumed snappy (WiredTiger's default); the article does not explicitly name the compressor.

Adjacent surfaces

  • Cache eviction metrics in db.serverStatus().wiredTiger.cache: bytes currently in the cache, maximum bytes configured, eviction rates. Rising evictions under steady workload ⇒ working set exceeds cache.
  • Checkpoint metrics in serverStatus.wiredTiger.checkpoint: checkpoint duration + frequency. Slow checkpoints typically signal write saturation.
  • Block-manager metrics: bytes read / written by the underlying storage-layer block manager; maps directly to disk-throughput consumption.

Seen in

  • sources/2026-02-27-mongodb-towards-model-based-verification-of-a-key-value-storage-engine — canonical wiki reference for WiredTiger as the target of model-based conformance checking. MongoDB's VLDB 2025 work extracts the interface boundary between the cross-shard transaction protocol and WiredTiger as a standalone Storage.tla spec (concepts/compositional-specification), enumerates its complete reachable state graph via a modified TLC, computes path coverings, and emits one test case per path as a sequence of WiredTiger API calls (patterns/test-case-generation-from-spec). Concrete result: 87,143 tests for a 2-key × 2-transaction finite model, generated + executed against WiredTiger in ~40 minutes. Specs + generator open-sourced at mongodb-labs/vldb25-dist-txns. Caveats: the current spec covers a subset of WiredTiger's API semantics; the finite model is intentionally tiny; full technical detail in the VLDB paper.

  • sources/2025-10-09-mongodb-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4 — WiredTiger's 1.5 GB cache on a 4 GB-RAM test machine is the specific budget the appV6R0 _id index overflowed, driving the pivot to quarter-bucketing in appV6R1. WiredTiger's default snappy compression (implicit throughout the article) is what produces the ~3.3-3.7× compression ratios observed across the appV5RX / appV6RX family's data vs storage sizes.

  • — WiredTiger as sibling B+tree storage engine to MySQL's InnoDB. PlanetScale's Ben Dicken names WiredTiger (with MongoDB, DynamoDB, Postgres) as a database that "relies on B-trees to perform efficient data lookups via indexes." Both WiredTiger and InnoDB are clustered B+tree engines — but at different database layers: WiredTiger is MongoDB's pluggable storage engine, InnoDB is MySQL's. Both store rows/documents inside the primary-key B+tree (concepts/clustered-index) with similar implications for primary-key choice — though MongoDB's default _id is a time-ordered ObjectId, which gives sequential PK locality by default (unlike UUIDv4).

Last updated · 542 distilled / 1,571 read