Skip to content

SYSTEM Cited by 9 sources

MongoDB Server

Overview

The MongoDB server (the mongod process) is MongoDB's C++ document-database implementation. Production deployments are typically replica sets — three or more cooperating servers achieving consensus via a Raft-like protocol. Clients send writes to the elected leader; followers replicate asynchronously; the commit point (logical timestamp of the newest majority-replicated write) is tracked and propagated.

The relevant aspect for the conformance-checking source (sources/2025-06-02-mongodb-conformance-checking-at-mongodb-testing-that-our-code-matches-our-tla-specs) is the commit-point propagation piece, spec'd in RaftMongo.tla.

Why it's hard to conformance-check

From the 2020 experiment (Davis + Schvimer):

  • Fine-grained locking, no global lock — the server was built to avoid a single-lock bottleneck. Capturing all protocol-relevant state variables atomically requires custom lock ordering or invasive instrumentation. The team spent ~1 month of a 10-week project figuring out how to do this consistently.
  • C++ codebase predates TLA+ specs — RaftMongo.tla was written years after the C++ implementation, at a higher abstraction. The abstraction mismatch (e.g. atomic-leader-handoff in the spec vs. step-down-then-step-up in the impl) was known going in and never reconciled.
  • Handwritten + randomized test coverage — ~300 handwritten JavaScript integration tests + the "rollback fuzzer" (random CRUD + random partition create/heal) were the conformance-checking inputs. Instrumented 3-node replica sets running on one machine over localhost (no clock-skew concern) produced combined logs that the Python translator tried to turn into TLA+ traces.

2020 test instrumentation (failed)

The trace-checking branch of the MongoDB server was never merged to mainline — Davis notes the instrumented version may not have faithfully represented the production binary's behaviour:

We burned most of our budget for the experiment, and we worried we'd changed MongoDB too much (on a branch) to test it realistically.

This is a general warning about high-instrumentation trace-checking: the instrumentation can perturb the very behaviour it's trying to observe.

What MongoDB uses in place of trace-checking

As of 2025 (Davis's retrospective), the MongoDB server's conformance story is:

  • Unit + integration + randomized tests continue as normal.
  • Multiple TLA+ specs maintained for consensus, replication, transactions, storage. Specs are model-checked but not conformance-checked against the impl.
  • VLDB 2025 Design and Modular Verification of Distributed Transactions in MongoDB — Will Schultz + Murat Demirbas; new TLA+ spec of the WiredTiger storage layer with test-case generation. This is a scoped conformance effort on a narrow component — the pattern Davis's 2025 retrospective recommends.
  • MongoDB-sponsored research — Finn Hackett (UBC, advised by Davis) builds a TLA+-to-Go compiler + trace-checker; 2025 Antithesis collaboration explores conformance at a different scale.

The trajectory since 2020 is narrower scope, new co-designed specs, better mechanical tooling — not keep trying to trace-check RaftMongo.tla against the whole server.

Seen in

Last updated · 200 distilled / 1,178 read