title: "Towards Model-based Verification of a Key-Value Storage Engine" type: source created: 2026-04-22 updated: 2026-04-22 source: MongoDB Blog source_slug: mongodb author: MongoDB Server Team (authors: William Schultz + Murat Demirbas, VLDB 2025) url: https://www.mongodb.com/company/blog/engineering/towards-model-based-verification-key-value-storage-engine published: 2026-02-27 tier: 2 tags: [mongodb, wiredtiger, tla-plus, formal-methods, conformance-checking, test-case-generation, model-checking, distributed-transactions, storage-engine, modular-verification, vldb-2025] systems: [mongodb-server, wiredtiger, tla-plus, tlc-model-checker, raft-mongo-tla] concepts: [conformance-checking, temporal-logic-specification, specification-driven-development, lightweight-formal-verification, bisimulation, compositional-specification, multi-document-acid-transactions, snapshot-isolation, test-case-minimization] patterns: [test-case-generation-from-spec, formal-methods-before-shipping, extreme-modelling, trace-checking] related: [sources/2025-06-02-mongodb-conformance-checking-at-mongodb-testing-that-our-code-matches-our-tla-specs, sources/2025-09-25-mongodb-carrying-complexity-delivering-agility, systems/tla-plus, systems/tlc-model-checker, systems/wiredtiger, systems/mongodb-server, concepts/conformance-checking, concepts/compositional-specification, patterns/test-case-generation-from-spec, patterns/formal-methods-before-shipping]
Towards Model-based Verification of a Key-Value Storage Engine¶
Summary¶
MongoDB's Part 2 follow-up to the 2026-02 distributed-transactions
formal-methods-before-shipping
series describes how the modular structure of the team's
TLA+ spec of MongoDB's cross-shard transaction
protocol enabled a second verification step beyond protocol-level
model checking: automatically checking that the underlying
WiredTiger storage engine implementation
conforms to the abstract storage interface the distributed-transactions
layer relies on. The team formalised the contract between the
transaction protocol and WiredTiger in a standalone Storage.tla
spec, used a modified TLC to enumerate
its complete reachable state graph for a finite configuration,
computed path coverings over that graph, and emitted one test
case per path as a sequence of WiredTiger API calls. For a small
finite model (2 keys × 2 transactions) this generated
87,143 tests checking implementation conformance to the abstract
spec, runnable against WiredTiger in ~40 minutes — a concrete
second production instance of
test-case generation from spec
at MongoDB after the 2020 Mobile-SDK OT experiment, this time on
the core server's storage layer rather than a narrow algorithmic
component. Published in full as the VLDB 2025 paper
Design and Modular Verification of Distributed Transactions in
MongoDB (Schultz + Demirbas).
Key takeaways¶
-
The storage-interface contract is a TLA+ spec of its own (concepts/compositional-specification). The team's cross-shard transaction TLA+ spec was written compositionally with a clean interface boundary between the distributed protocol and the single-node WiredTiger storage engine;
Storage.tlaformalises that boundary as its own module with timestamp-based operations matching the WiredTiger storage engine API. "We developed a TLA+ specification of MongoDB's distributed transactions protocol in a compositional manner, describing the high level protocol behavior while also formalizing the boundary between the distributed aspect of the transactions protocol and the underlying single-node WiredTiger storage engine component." (Source: sources/2026-02-27-mongodb-towards-model-based-verification-of-a-key-value-storage-engine) -
The abstract storage spec doubles as both documentation and test oracle. "Our storage layer specification itself serves as a useful and precise definition of a subset of the WiredTiger storage engine semantics, but we can also use it for automatically generating test cases to check conformance of WiredTiger semantics to our spec." The same artifact is read by humans reasoning about the protocol and consumed mechanically by the conformance-checker.
-
The mechanical pipeline is: complete state graph → path covering → one test per path → run against WiredTiger. A modified TLC enumerates every reachable state of
Storage.tlafor a finite parameter set; the team then computes path coverings of the state graph; each path becomes a sequence of storage-API calls, emitted as one executable test case. Canonical instance of patterns/test-case-generation-from-spec applied to a storage engine rather than an algorithm. -
Concrete scale: 2 keys × 2 transactions → 87,143 tests in ~40 minutes. "For a small, complete finite model (2 keys and 2 transactions), we are able to check conformance with WiredTiger with a generated suite of 87,143 tests, which are generated and executed against the storage engine in around ~40 minutes." Reference point for what complete path-covering conformance costs at a useful finite configuration — an order of magnitude larger than the 2020 OT experiment's 4,913 behaviours.
-
Specs + code open-sourced. The current storage-layer specification lives in
Storage.tlain the mongodb-labs/vldb25-dist-txns repo, and the repo exposes an interactive, explorable version of the full compositional spec. The VLDB paper is p5045-schultz.pdf. -
Roadmap: broader API coverage + alternative exploration strategies. "In the future, we hope to explore modeling of a more extensive subset of the WiredTiger API and explore alternate state space exploration strategies for generating tests, e.g., randomized path sampling, or other strategies that only require approximate coverage of the state space model." Complete enumeration scales combinatorially; approximate-coverage sampling is the expected scaling lever.
-
LLM angle, self-consciously speculative. "In the future, we also remain optimistic about the role that LLMs can play in this type of verification workflow. For example, once such a testing harness for checking conformance is developed, having LLMs autonomously develop a model of the underlying system behavior allows for precise characterization of behavior, along with automated test suite generation capabilities." Points at the same LLM-autoformalization-plus-automated-reasoning thesis articulated elsewhere (Byron Cook 2026-02, AWS Bedrock Guardrails), applied to writing more TLA+ specs of storage internals rather than running the checker.
Systems¶
- TLA+ — specification language;
Storage.tlais the storage-interface module; cross-shard transactions spec is the compositional parent. - TLC — modified version used to
generate the complete reachable state graph of
Storage.tlafor finite parameters. The state-graph-export capability called for by issue #1073 in the 2020 retrospective is the enabling upstream mechanism. - WiredTiger — the implementation under test; MongoDB's default B-tree-based storage engine with timestamp-aware operations. Its API is the alphabet of the generated test sequences.
- MongoDB Server — the C++ server whose
sharded transaction protocol runs atop WiredTiger; the full
Storage.tla-matching contract is what the protocol relies on for correctness. - RaftMongo.tla — sibling MongoDB TLA+ spec; contrast point where 2020 trace-checking failed vs. 2025 storage-test-case-generation succeeding.
Concepts¶
- concepts/conformance-checking — the umbrella problem: does the implementation behave as the spec says? This source is a second successful MongoDB instance (after Mobile SDK OT 2020), now at the storage-engine layer of the production server.
- Compositional / modular
specification — the load-bearing precondition: the
distributed-transactions spec was written compositionally with
a clean interface to storage, so
Storage.tlacould be lifted out and checked independently. - Temporal-logic specification
— TLA+'s underlying formalism;
Storage.tlais a temporal-logic spec just like the transactions spec above it. - concepts/specification-driven-development — spec + impl co-designed with explicit interface boundaries.
- concepts/bisimulation — test-case generation provides the
Bspec ⊆ Bimpldirection of checking; combined with protocol-level model checking on the parent spec, it gives bisimulation-class coverage of the storage subsystem. - concepts/lightweight-formal-verification — third-point comparative: ShardStore (executable spec, in-production-language) vs. Nucleus (no spec, deterministic simulation on impl) vs. MongoDB storage engine (TLA+ spec + generated tests). All three target the same "spec-aware testing" zone from different angles.
- concepts/test-case-minimization — the inverse discipline called out as complementary: once a generated test fails, minimisation is how to present the failure; this post is the generation side.
Patterns¶
- patterns/test-case-generation-from-spec — canonical pattern extended with a storage-engine case alongside the 2020 Mobile-SDK OT case. Adds a concrete data point on the scaling cost (2×2 finite model → 87,143 tests / ~40 min) and on the path-covering-based generation variant vs. the full behaviour enumeration the OT case used.
- patterns/formal-methods-before-shipping — the umbrella discipline; this post is the "after-model-check" conformance step of the broader pattern.
- patterns/extreme-modelling — methodology of writing multiple small specs in step with implementation; compositional spec + per-component conformance check is a concrete realisation.
- patterns/trace-checking — the dual direction not used here; contrast point for why test-case generation fit the WiredTiger conformance problem (deterministic-drivable storage-API surface) better than trace-checking would have.
Operational numbers¶
- Finite model used: 2 keys × 2 transactions.
- Generated test suite size: 87,143 tests for that finite configuration (complete path covering of the reachable state graph).
- End-to-end generation + execution time: ~40 minutes for the full suite against WiredTiger.
- State-space exploration technique: complete reachable-state graph via modified TLC, then path coverings (Mills' 1979 path-covering framework), one path per emitted test case.
No throughput / latency / bug-counts disclosed here — the architectural numbers live in the VLDB paper (p5045-schultz.pdf).
Caveats¶
- Finite model only. 2 keys × 2 transactions is a small corner of WiredTiger's behaviour surface; the 87,143 tests prove conformance in that corner, not globally. The post is explicit about wanting to model "a more extensive subset of the WiredTiger API" and explore "alternate state space exploration strategies… randomized path sampling, or other strategies that only require approximate coverage of the state space model."
- Current spec is a subset.
Storage.tla"serves as a useful and precise definition of a subset of the WiredTiger storage engine semantics". Operations outside that subset are not under the conformance check. - Short blog form, details in the paper. The post is an accessible overview; the VLDB paper (p5045-schultz.pdf) is the load-bearing technical artefact for path-covering method, test-generator implementation, bug counts, and spec-to-impl mapping details.
- No bug-find disclosure here. The post frames the suite as
conformance checking, not as a bug-hunt retrospective; whether
any WiredTiger divergences from
Storage.tlawere found and fixed is not stated on-page (paper likely has the answer). - Reproducibility requires the modified TLC. Test generation depends on a custom fork of TLC; current open-source TLC does not export complete reachable-state graphs directly (see the upstream request in issue #1073).
Relation to other sources¶
- Part 1 / 2025-06-02: Conformance Checking at MongoDB — A. Jesse Jiryu Davis's 2025 retrospective on the 2020 VLDB paper. Framed patterns/test-case-generation-from-spec as the conformance-checking direction that worked at MongoDB (Mobile SDK OT: 4,913 tests / 100% branch coverage / 1 real bug found) vs. patterns/trace-checking on the server's Raft-like protocol that failed (10 weeks, abstraction mismatch, multithreaded snapshot cost). This post is the second production instance of the same direction that worked, now at the storage-engine layer and an order of magnitude larger in generated-suite size (~87K vs 4.9K).
- [[sources/2025-09-25-mongodb-carrying-complexity-delivering-agility|2025-09-25: Carrying complexity, delivering agility]] — MongoDB's 2025 engineering-leadership manifesto; names multi-shard transactions ("modular TLA+ spec verifying snapshot isolation against the WiredTiger storage interface with an explicit permissiveness analysis") as one of two published production applications of TLA+ alongside logless reconfiguration. This post is the depth description of the storage-interface verification the manifesto referred to at one-bullet depth.
- VLDB 2025 paper (p5045-schultz.pdf) — the full technical artefact (Design and Modular Verification of Distributed Transactions in MongoDB, Schultz + Demirbas).
- Conformance Checking at MongoDB — older MongoDB post on the 2020 experiment, referenced in the post body as "other testing efforts we had explored in the past." Same as the 2025-06-02 retrospective source.
- Related academic efforts named in the post:
- Automated Verification of Distributed Systems via Graphs of Reachable States (linked as "Similar approaches have been explored recently").
- Kayfabe: Model-based program testing with TLC (linked as "other approaches for testing systems using path-based test case generation techniques").
Source¶
- Original: https://www.mongodb.com/company/blog/engineering/towards-model-based-verification-key-value-storage-engine
- Raw markdown:
raw/mongodb/2026-02-27-towards-model-based-verification-of-a-key-value-storage-engi-8876ba37.md - VLDB 2025 paper: p5045-schultz.pdf
- Code + specs: mongodb-labs/vldb25-dist-txns — Storage.tla.
Related¶
- sources/2025-06-02-mongodb-conformance-checking-at-mongodb-testing-that-our-code-matches-our-tla-specs
- sources/2025-09-25-mongodb-carrying-complexity-delivering-agility
- systems/tla-plus
- systems/tlc-model-checker
- systems/wiredtiger
- systems/mongodb-server
- systems/raft-mongo-tla
- concepts/conformance-checking
- concepts/compositional-specification
- concepts/temporal-logic-specification
- concepts/lightweight-formal-verification
- patterns/test-case-generation-from-spec
- patterns/formal-methods-before-shipping
- patterns/extreme-modelling
- companies/mongodb