CONCEPT Cited by 1 source

Deterministic Simulation¶

Definition¶

Deterministic simulation is a testing discipline where the entire system under test runs inside a custom executor/scheduler that eliminates every source of non-determinism it doesn't itself control — then drives the system forward from a seeded pseudo-random number generator (PRNG) so that (seed, code-version) → exact execution trace holds. The PRNG is used for everything: initial state generation, task scheduling, network-message order, fault injection, time.

The discipline's payoff is reproducibility under adversarial schedules: when a randomized test fails, you output the seed and commit hash, and the next run with the same pair reproduces the exact failure — so an engineer can add logging inline and re-run with a guarantee of failing again, without the frustrated-grep cycle characteristic of flaky randomized tests. See patterns/seed-recorded-failure-reproducibility for the developer- experience contract that sits on top.

Why this matters¶

Distributed-systems bugs are dominated by low-probability event orderings — a network message arriving before a filesystem flush, two concurrent edits interleaving a specific way, a crash at a specific moment. In production these are rare; in testing you want to hit them with much higher probability than production — and you want to keep hitting them until you've fixed them.

Random unit/integration tests that are non-deterministic by default catch some of these events, but the developer experience is famously bad: a failure you can't reproduce is a failure you can't fix. Many teams respond by deleting flaky tests or ignoring specific seeds, which pays back as escaped production bugs later. Deterministic simulation is the structural fix.

Load-bearing requirements¶

A single-scheduler architecture under test. The system must be controllable by one executor. In practice this means either one OS thread driving everything (Dropbox Nucleus, TigerBeetle, some FoundationDB layers) or one custom scheduler intercepting all concurrency primitives. See patterns/single-threaded-with-offload for the sync-engine realization.
Mocked sources of non-determinism. The filesystem, network, timer, random-number generator, thread scheduler — anything that makes two runs diverge by default — must be replaceable with a PRNG-driven alternative. Dropbox Trinity mocks FS + network + time; the Rust future tree is driven by Trinity-as-custom-executor.
The system must not itself contain hidden non-determinism. Rust's default HashMap uses randomized hashing to resist DoS collision attacks; Nucleus overrides it with a deterministic hasher because DoS protection costs reproducibility it needs more. Every "clever" library trick of this form is a determinism hazard.
Commit hash is part of the test input. Reproducibility implicitly requires the code didn't change. "Same seed" alone is not enough; CI has to record the commit that produced the failing trace and reproduce on that commit.

Realizations in the wild¶

Dropbox Trinity (~2020) — Rust sync engine tested via a custom futures executor + in-memory FS mock + Rust backend mock + mockable timer; tens of millions of runs per night.
Dropbox CanopyCheck (~2020) — narrower QuickCheck-style framework on the Nucleus planner only; no I/O mocking needed because the planner is pure.
FoundationDB simulation framework (Apple's distributed-transactional KV; Flow language-level actor model) — ancestor of the approach; widely cited as influence.
TigerBeetle VOPR (accounting database, 2022+) — open-source realization with VSR replication under deterministic failure injection.
Property-based testing libraries (QuickCheck, Hypothesis, proptest) — the "just the input" half of the discipline, with minimization.
Jepsen — arguably the opposite tradition: not deterministic, but randomized + linearizability checking against a real distributed system; catches different bugs.

Trade-offs¶

Against. Maintenance cost is high: every new I/O, concurrency primitive, or external dependency is a potential determinism leak. Every platform API has to be mockable, which is invasive in the codebase. Determinism guarantees are local to the mocked layer — Trinity names its scope limits explicitly: can't reboot the machine, real-backend protocol drift outside scope, minimization-via-shrinking doesn't work end-to-end.

For. Once it exists, you can run the test suite at absurd volume (Nucleus: tens of millions of runs per night) and every seed failure is a ticket with a reproducer. Teams report catching classes of bug — silent data loss, race-condition clobbers — that no other technique reliably finds.

Relation to lightweight formal verification¶

concepts/lightweight-formal-verification (ShardStore / P / TLA+-in-Rust patterns) and deterministic simulation are complementary: the former proves global properties about an abstract model, the latter empirically checks invariants under seeded-random schedules over the real implementation. Both operate on a similar stance — "tests are properties that must hold on inputs I don't hand-pick" — but the tools are disjoint. ShardStore uses property-based testing on its adjacent executable spec; Dropbox uses property-based testing directly on Nucleus via CanopyCheck + Trinity.

Seen in¶

sources/2024-05-31-dropbox-testing-sync-at-dropbox-2020 — Dropbox's canonical walkthrough; two realizations (CanopyCheck narrow + Trinity end-to-end) plus an explicit developer-experience contract.