SYSTEM Cited by 1 source

Trinity¶

Definition¶

Trinity is Dropbox's end-to-end randomized testing framework for Nucleus. Where CanopyCheck is narrow (planner only, no I/O/concurrency) and property-based, Trinity tests the full engine including all of its concurrency, by running Nucleus as a Rust future under a custom executor while mocking out the filesystem, network, and timer. This is concepts/deterministic-simulation applied to a sync engine — the same family as FoundationDB's simulation framework and TigerBeetle's VOPR.

Trinity exists specifically to catch the subtle race conditions that CanopyCheck can't see — the paper gives an exemplar: Ada deletes foo; Grace's engine is told to delete foo; simultaneously Grace writes new data to foo locally; Grace's engine erroneously deletes, clobbering the change. These are the "lose user data" class of bug Dropbox cannot ship, and which are essentially impossible to reproduce in production.

Architecture¶

Initialization. Trinity initializes the external state that Nucleus observes — the server-side Dropbox content for that user, plus the local Dropbox folder on disk. Then it instantiates Nucleus, analogously to a user linking a desktop client to a previously-existing Dropbox folder.

Execution. Trinity alternates between scheduling Nucleus and scheduling itself on the main thread. Until Nucleus reports it's synced, Trinity aggressively agitates the system: - modifies the local and remote filesystems mid-sync, - intercepts Nucleus's asynchronous requests and reorders the responses, - injects filesystem errors and network failures, - simulates machine crashes by snapshotting and restoring old filesystem states, - fast-forwards the mockable timer arbitrarily (intercepting e.g. the 5-minute online-only-placeholder download timeout and firing it whenever).

Verification. Once Nucleus reports it's synced, Trinity asserts the system is in a consistent state. It then re-runs the same test with the same seed to verify that the result is reproduced — this is a meta-check on Trinity's own determinism.

How concurrency is simulated deterministically¶

Nucleus is one giant Rust impl Future<Output = !> composed of nested sub-futures (e.g. the upload worker contains a FuturesUnordered of per-request futures). Trinity is a custom executor for that future.

On each main-loop iteration: 1. Run Trinity's own perturbation code (randomly driven by the PRNG). 2. Call poll() on the top-level Nucleus future. This lets subsystems make progress. The return is always Poll::Pending (because Nucleus never terminates by design — ! output type). 3. Call poll() on all intercepted mock FS/network requests. 4. When Nucleus becomes blocked on outstanding requests, Trinity picks some to either succeed or fail — amplifying the probability of less-likely execution orderings, not just simulating production-typical ones.

This is the sync-engine version of deterministic simulation testing: the custom executor + mocked sources of nondeterminism together make (seed, commit-hash) → reproducible exact execution.

Mocks¶

Filesystem. Trinity replaces the native platform filesystem with an in-memory mock. Injects failures per-operation, reorders requests, snapshots/restores state to simulate crashes. In-memory also gives ~10× more test runs per CPU than native.
Network. The entire server backend — metadata DB, content storage, notification services — is replaced with a Rust mock that emulates all server-side services Nucleus depends on. Arbitrarily reorder, delay, or fail RPCs.
Time. Nucleus uses a generic, mockable timer object throughout. Trinity fast-forwards time arbitrarily.

Trinity Native¶

Because the in-memory FS mock loses coverage of OS-specific platform code (permissions, extended attributes, Smart Sync placeholder hydration, etc.), Trinity also runs in a "native" mode that targets the actual platform filesystem. Cost: ~10× slower, fewer seeds per CPU-hour. To preserve determinism, Trinity Native serializes all native syscalls — which itself is a coverage gap: real users generate arbitrary system-call interleavings that Trinity Native cannot reproduce.

Scope limitations (named in the post)¶

Can't reboot the machine mid-test — so fsync durability guarantees on each platform are outside Trinity's scope.
Network protocol can drift — the Rust backend mock may not match real Dropbox server behavior. Heirloom is the separate suite that talks to a real backend, at a ~100×-slower cost.
Minimization doesn't scale — unlike CanopyCheck, perturbing Trinity's initial state changes network-request scheduling downstream and invalidates the seed. Engineers have to analyze failures by hand: add logging, re-run, grep. A potential fix is in consideration: decouple the global PRNG into several independent per-subsystem PRNGs so perturbing one axis doesn't invalidate failures on other axes.

Seen in¶

sources/2024-05-31-dropbox-testing-sync-at-dropbox-2020 — canonical description of Trinity's executor-as-test-harness architecture and its mocks.