PATTERN Cited by 5 sources
Formal Methods Before Shipping¶
Problem¶
New protocol — replication, failover, reconfiguration, a transaction-isolation layer, a consensus variant — is correct in principle but has a vast set of possible event interleavings that humans cannot reason through exhaustively. Production pain under the long tail of those interleavings comes in the form of torn-writes, stuck primaries, split-brain, durability loss, or isolation violations — bugs that are rare in testing, catastrophic in production, and expensive to reproduce from a stack trace.
The industry default is still: code it, run chaos / fault- injection tests, ship it, fix things in prod. That works until the rare interleaving shows up.
Pattern¶
Before shipping the implementation, build a mathematical model of the protocol's core logic — stripped of disk format, thread pools, serialisation detail, etc. — in a specification language such as TLA+. Run a model checker against the model, which explores every possible interleaving of events the model permits, not just the ones a human would have thought to test. Use the checker to find the minimal invariants that hold under every interleaving — and then ship those invariants, not the spec itself, as the design contract engineers build against.
MongoDB's canonical articulation:
"When we design a new replication or failover protocol, we don't just code it, run a few chaos tests, and ship it. We build a mathematical model of the core logic stripped of distracting details like disk format or thread pools and ask a model checker to try every possible interleaving of events. The tool doesn't skip the 'unlikely' cases. It tries them all."
"We modeled the protocol in TLA+, explored millions of interleavings, and distilled the solution down to four invariants."
(Source: sources/2025-09-25-mongodb-carrying-complexity-delivering-agility)
Recipe¶
- Write the spec alongside the implementation design. Not after the code is written (see the 2020 retrofitting failure) and not years before. The closer the abstractions, the more useful the spec.
- Strip detail the checker can't help with. Disk format, serialisation, concurrency primitives, syscalls. The spec is about ordering and state transitions, not runtime implementation.
- State the desired safety + liveness invariants explicitly. Without invariants the checker has nothing to check. In TLA+ these are expressed as temporal-logic properties.
- Run the model checker. The checker enumerates every reachable state and reports counterexamples. Iteratively refine the protocol or the invariants until no counterexample is found within the chosen bound.
- Distil the result to a small set of invariants. The artifact that goes into the implementation team's review documentation is not the 300-line TLA+ spec — it is the 4-to-8 one-line invariants the spec proved are sufficient. Engineers can keep those in their heads.
- Build the implementation against the invariants. Review, PR discussion, and future bug triage are anchored on the distilled invariants, not the spec file.
MongoDB's canonical instance: logless reconfiguration¶
MongoDB's published recipe-in-practice (arXiv:2102.11960; wiki page concepts/logless-reconfiguration):
- Design: decouple configuration changes from the data replication log so membership changes don't queue behind user writes.
- Model: TLA+ spec of the reconfiguration protocol.
- Check: TLC explores "millions of interleavings."
- Distil: four invariants — (i) terms block stale primaries, (ii) monotonic versions prevent forks, (iii) majority votes stop minority splits, (iv) the oplog-commit rule ensures durability carries forward.
- Implement: server team writes C++ against the four invariants.
The same recipe was applied to multi-shard transactions (VLDB 2025; modular TLA+ spec, WiredTiger interface tested via automated model-based techniques, snapshot isolation verified, concurrency permissiveness analysed).
Complementary — not replacement — for deterministic simulation¶
The MongoDB post explicitly pairs the two:
"Alongside formal proofs, we use additional tools to test the implementation under deterministic simulation: fuzzing, fault injection, and message reordering against real binaries. Determinism gives us one-click bug replication, CI/CD regression gates, and reliable incident replays."
The spec checks that the protocol is correct. The simulation checks that the implementation actually obeys the protocol under real binaries. Industry peer instances of the complementary pair:
- Dropbox Nucleus (Source: sources/2024-05-31-dropbox-testing-sync-at-dropbox-2020) — deterministic simulation + property-based invariants.
- AWS ShardStore (Source: sources/2025-02-25-allthingsdistributed-building-and-operating-s3) — in-language executable-spec form, but same discipline: model and executable co-evolve.
When this pattern doesn't pay¶
- CRUD / stateless services. No ordering concerns, no partial-failure modes — little for a model-checker to check.
- Very fast-moving prototypes. If the protocol is likely to change weekly, the spec becomes overhead. Prototype first; formalise when the protocol stabilises enough to be shipped-to-paying-customers and correctness becomes the risk, not velocity.
- Protocols whose complexity is in the data, not the interleavings. Query planners, indexing strategies, ML scoring pipelines — these benefit more from property-based testing + load-testing than from model-checking a state machine.
Failure modes¶
- Spec abstraction mismatch. If the spec assumes a single atomic action where the implementation does two separate actions, no amount of checking rescues correctness. RaftMongo.tla's 2020 trace-checking failure is the cautionary tale.
- Model that is too big to check. State-space explosion is real; over-detailed models don't terminate. The recipe's "strip detail" step is load-bearing — if the team can't agree on what's irrelevant, the spec won't fit in the checker.
- Invariants that look right, aren't. Humans can under-specify: the invariant set looks complete, but a counterexample exists outside the properties the team thought to state. The checker only finds violations of stated properties. Adversarial red-teaming of the invariant set — "what would a hostile adversary have to violate to break durability?" — partially mitigates.
Related patterns on this wiki¶
- patterns/extreme-modelling — the eXtreme Modelling recipe many small specs co-evolved with implementation; specific 2011 methodology MongoDB has applied.
- patterns/invariant-driven-programming — implementation- level sibling; program around the distilled invariants once the spec has produced them.
- patterns/trace-checking — the conformance-checking pair: after shipping, check that traces of the real implementation are valid behaviours of the spec.
- patterns/test-case-generation-from-spec — the other conformance direction: enumerate spec behaviours, emit tests that force the impl to follow.
- patterns/seed-recorded-failure-reproducibility — deterministic-simulation pair this pattern explicitly complements.
Seen in¶
- sources/2025-09-25-mongodb-carrying-complexity-delivering-agility — canonical wiki instance; the "model-check every interleaving, distil to invariants" framing; logless reconfiguration + multi-shard transactions as the two named production examples.
- sources/2025-06-02-mongodb-conformance-checking-at-mongodb-testing-that-our-code-matches-our-tla-specs — the other half of MongoDB's formal-methods discipline: post-ship conformance, and what goes wrong when the spec didn't come first.
- sources/2026-02-27-mongodb-towards-model-based-verification-of-a-key-value-storage-engine — Part 2 of the VLDB 2025 work. Demonstrates that when the protocol-level spec is written compositionally with a clean interface to the storage layer, the after-shipping conformance check against WiredTiger becomes tractable (87,143 generated tests from a 2-key × 2-transaction finite model in ~40 minutes via patterns/test-case-generation-from-spec). Formal methods before shipping + compositional structure = conformance after shipping also within reach.
- sources/2024-05-31-dropbox-testing-sync-at-dropbox-2020 — peer industry instance, different toolkit (deterministic simulation + property-based testing).
- sources/2025-02-25-allthingsdistributed-building-and-operating-s3 — AWS ShardStore's executable-spec realisation of the same discipline.