Skip to content

PATTERN Cited by 5 sources

Formal Methods Before Shipping

Problem

New protocol — replication, failover, reconfiguration, a transaction-isolation layer, a consensus variant — is correct in principle but has a vast set of possible event interleavings that humans cannot reason through exhaustively. Production pain under the long tail of those interleavings comes in the form of torn-writes, stuck primaries, split-brain, durability loss, or isolation violations — bugs that are rare in testing, catastrophic in production, and expensive to reproduce from a stack trace.

The industry default is still: code it, run chaos / fault- injection tests, ship it, fix things in prod. That works until the rare interleaving shows up.

Pattern

Before shipping the implementation, build a mathematical model of the protocol's core logic — stripped of disk format, thread pools, serialisation detail, etc. — in a specification language such as TLA+. Run a model checker against the model, which explores every possible interleaving of events the model permits, not just the ones a human would have thought to test. Use the checker to find the minimal invariants that hold under every interleaving — and then ship those invariants, not the spec itself, as the design contract engineers build against.

MongoDB's canonical articulation:

"When we design a new replication or failover protocol, we don't just code it, run a few chaos tests, and ship it. We build a mathematical model of the core logic stripped of distracting details like disk format or thread pools and ask a model checker to try every possible interleaving of events. The tool doesn't skip the 'unlikely' cases. It tries them all."

"We modeled the protocol in TLA+, explored millions of interleavings, and distilled the solution down to four invariants."

(Source: sources/2025-09-25-mongodb-carrying-complexity-delivering-agility)

Recipe

  1. Write the spec alongside the implementation design. Not after the code is written (see the 2020 retrofitting failure) and not years before. The closer the abstractions, the more useful the spec.
  2. Strip detail the checker can't help with. Disk format, serialisation, concurrency primitives, syscalls. The spec is about ordering and state transitions, not runtime implementation.
  3. State the desired safety + liveness invariants explicitly. Without invariants the checker has nothing to check. In TLA+ these are expressed as temporal-logic properties.
  4. Run the model checker. The checker enumerates every reachable state and reports counterexamples. Iteratively refine the protocol or the invariants until no counterexample is found within the chosen bound.
  5. Distil the result to a small set of invariants. The artifact that goes into the implementation team's review documentation is not the 300-line TLA+ spec — it is the 4-to-8 one-line invariants the spec proved are sufficient. Engineers can keep those in their heads.
  6. Build the implementation against the invariants. Review, PR discussion, and future bug triage are anchored on the distilled invariants, not the spec file.

MongoDB's canonical instance: logless reconfiguration

MongoDB's published recipe-in-practice (arXiv:2102.11960; wiki page concepts/logless-reconfiguration):

  1. Design: decouple configuration changes from the data replication log so membership changes don't queue behind user writes.
  2. Model: TLA+ spec of the reconfiguration protocol.
  3. Check: TLC explores "millions of interleavings."
  4. Distil: four invariants — (i) terms block stale primaries, (ii) monotonic versions prevent forks, (iii) majority votes stop minority splits, (iv) the oplog-commit rule ensures durability carries forward.
  5. Implement: server team writes C++ against the four invariants.

The same recipe was applied to multi-shard transactions (VLDB 2025; modular TLA+ spec, WiredTiger interface tested via automated model-based techniques, snapshot isolation verified, concurrency permissiveness analysed).

Complementary — not replacement — for deterministic simulation

The MongoDB post explicitly pairs the two:

"Alongside formal proofs, we use additional tools to test the implementation under deterministic simulation: fuzzing, fault injection, and message reordering against real binaries. Determinism gives us one-click bug replication, CI/CD regression gates, and reliable incident replays."

The spec checks that the protocol is correct. The simulation checks that the implementation actually obeys the protocol under real binaries. Industry peer instances of the complementary pair:

When this pattern doesn't pay

  • CRUD / stateless services. No ordering concerns, no partial-failure modes — little for a model-checker to check.
  • Very fast-moving prototypes. If the protocol is likely to change weekly, the spec becomes overhead. Prototype first; formalise when the protocol stabilises enough to be shipped-to-paying-customers and correctness becomes the risk, not velocity.
  • Protocols whose complexity is in the data, not the interleavings. Query planners, indexing strategies, ML scoring pipelines — these benefit more from property-based testing + load-testing than from model-checking a state machine.

Failure modes

  • Spec abstraction mismatch. If the spec assumes a single atomic action where the implementation does two separate actions, no amount of checking rescues correctness. RaftMongo.tla's 2020 trace-checking failure is the cautionary tale.
  • Model that is too big to check. State-space explosion is real; over-detailed models don't terminate. The recipe's "strip detail" step is load-bearing — if the team can't agree on what's irrelevant, the spec won't fit in the checker.
  • Invariants that look right, aren't. Humans can under-specify: the invariant set looks complete, but a counterexample exists outside the properties the team thought to state. The checker only finds violations of stated properties. Adversarial red-teaming of the invariant set — "what would a hostile adversary have to violate to break durability?" — partially mitigates.

Seen in

Last updated · 200 distilled / 1,178 read