Skip to content

SYSTEM Cited by 1 source

MongoDB Mobile SDK

Overview

The MongoDB Mobile SDK was a mobile-local document database that synced with a central server — mobile clients could read/write data locally, and changes were periodically uploaded to the server and downloaded by other clients. The SDK and server both implemented the same conflict-resolution algorithmOperational Transform (OT) — so writes from multiple clients / server converged to identical final data on every participant.

Status: the product was sunsetMongoDB deprecated Atlas Device Sync (the rebranded successor).

Why it mattered for conformance checking

The Mobile SDK was the one MongoDB case where conformance checking succeeded (Source: sources/2025-06-02-mongodb-conformance-checking-at-mongodb-testing-that-our-code-matches-our-tla-specs):

Setup

  • Originally one C++ implementation shared across client and server.
  • In 2020, MongoDB rewrote the server in Go, creating two implementations of the same OT algorithm that had to agree on conflict resolution.
  • Conformance became urgent: if the C++ and Go implementations diverged on any conflict case, user data would silently diverge across the client-server boundary.

The conformance experiment

Max Hirschhorn applied test-case generation to the array-merge portion:

  1. Translated ~1000 lines of C++ (the OT array-merge algorithm) to TLA+ manually in 2 weeks. 19 client operation types, 6 array operations, 21 merge rules.
  2. Model-checked the spec with TLC to verify all participants eventually converge to the same data. This step crashed TLC with a Java StackOverflowError on one conflict case (swap two elements + move one element → infinite recursion in the algorithm itself). Verified the bug existed in the C++; fixed by deprecating the element-swap operation in the Mobile SDK.
  3. Constrained the state space to 3 participants × 3-element array × 1 write op each → 30,184 states / 4,913 behaviours — a finite enumerable DAG.
  4. Emitted 4,913 C++ unit tests via a Go generator that parsed TLC's state-graph output.
  5. 100% branch coverage achieved on the C++ OT implementation — vs. 21% from handwritten tests and 92% from millions of AFL-fuzzer runs.

Why it worked here but not on the server

The Mobile SDK's OT was the ideal shape for test-case generation:

  • Narrow algorithmic component (~1000 lines of impl).
  • Finite state space under constraint (3×3×1).
  • Sequential — no threading / distribution / networking to squash in the test.
  • Two impls of the same algorithm — strong motivator for conformance.
  • Spec written fresh specifically for this experiment — no pre-existing abstract spec to retrofit onto.

Contrast with MongoDB's server, where trace-checking failed because RaftMongo.tla was a pre-existing high- abstraction spec and the server is a multithreaded C++ system at much higher complexity.

Seen in

Last updated · 200 distilled / 1,178 read