SYSTEM Cited by 1 source
Dropbox Nucleus¶
Definition¶
Nucleus is Dropbox's rewritten sync engine — the process that synchronizes files between the local Dropbox folder on a user's machine and the user's Dropbox on the server. It replaces Sync Engine Classic, the 12+-year evolutionary descendant of the original. Written in Rust. Deployment target: hundreds of millions of machines across every consumer OS Dropbox supports. Shipped ~2020.
The central architectural claim is that Nucleus was designed-for-testability from the start: the data model, client-server protocol, and concurrency model were each shaped specifically to make deterministic-seeded randomized testing frameworks feasible to build on top (systems/canopycheck, systems/trinity).
Data model: three trees of observations¶
Nucleus persists observations rather than pending work. Three trees, collectively "Canopy":
- Remote Tree — latest state of the user's Dropbox in the cloud.
- Local Tree — last observed state of the user's Dropbox on disk.
- Synced Tree — last known "fully synced" state between Remote and Local. Acts as a per-node merge base.
The Synced Tree is the key innovation: it disambiguates direction of change. If Local matches the Synced merge base but Remote differs, the change was remote. Without the Synced Tree, you couldn't tell "user added foo on the web" from "user deleted foo locally." See concepts/merge-base-three-tree-sync.
Contrast with Classic, which persisted the pending work itself (create here, upload there) and had no merge base. Nucleus derives the sync plan as a pure function of the three trees, which makes the goal cleanly expressible: converge all three trees to the same state.
Node identity: unique IDs, not paths¶
Nodes in Nucleus are keyed by a unique identifier, not by path. A
folder rename is one attribute update in the database + one atomic
rename(2) on disk. In Classic, nodes were path-keyed, so renaming a
folder with N descendants became O(N) deletes + O(N) adds, both in the
database and on disk — transiently exposing two inconsistent subtrees
to the user and to downstream components.
Invariant unlockable under ID-keyed representation: a moved folder is visible in exactly one location. This is structurally false under path-keyed representation.
Protocol: invalid states are rejected at the wire¶
Sync Engine Classic's server-client protocol allowed a client to
receive metadata about /baz/cat before receiving /baz. The SQLite
schema had to represent orphaned files, and every component
processing filesystem metadata had to handle that case. As a result,
real "orphaned file" bugs were indistinguishable from acceptable
transient states.
Nucleus's protocol reports a critical error on parentless-node metadata before it can enter the client state. The persisted data model then enforces "no node can exist, even transiently, without a parent directory" as an invariant. See concepts/design-away-invalid-states.
Concurrency: one control thread + offload pools¶
Nearly all Nucleus code runs on a single control thread. Blocking or parallelizable work — network I/O, filesystem I/O, CPU-intensive hashing — is offloaded to dedicated thread pools.
Under test, asynchronous requests are serialized onto the main thread, which means the entire engine runs single-threaded and deterministic from the test framework's point of view. This is the load-bearing substrate beneath systems/canopycheck and systems/trinity: their reproducibility guarantees are impossible without it. See patterns/single-threaded-with-offload.
Classic, by comparison, let components fork threads freely and coordinate via global locks + hard-coded timeouts + backoffs — which made execution order OS-dependent and its tests structurally flaky.
Futures composition¶
Nucleus is one giant Rust future — impl Future<Output = !> where
! is the uninhabited type (the future can never return; only the
OS shutting down the process stops it). Internally it is composed
of numerous sub-futures (the "upload worker" wraps a
FuturesUnordered of concurrent network-request futures; similar
structure for download, notifications, etc.).
Trinity is a custom executor for the top-level Nucleus future:
on each iteration of its main loop it calls poll() on Nucleus,
poll() on all intercepted mock FS/network requests, and then
chooses (driven by the PRNG) to satisfy/fail those requests or to
perturb external state. This single-executor-sees-everything
property is what makes Trinity's failure injection coverage as
broad as it is.
Planner¶
The planner is the core algorithm: given the three trees, output a batch of concurrent operations that incrementally converges them. Example operations: create this folder on disk, commit this edit to the server. Batching is safety-driven: ops within a batch must be safe to run concurrently (a file cannot be created before its parent directory; two sibling files can edit concurrently).
Planner correctness is the central load-bearing property of the system, which is why systems/canopycheck exists specifically to property-test it.
Determinism requirements¶
To make (seed, commit-hash) → exact execution trace hold, Nucleus
eliminates non-determinism sources that don't materially matter for
a sync-engine threat model. Example: Rust's default HashMap uses
randomized hashing for DoS resistance against adversaries who can
force collisions; Nucleus overrides this
with a deterministic hasher, because an adversarial user of Dropbox
could only degrade their own client performance via collisions —
the DoS-resistance guarantee was paying for nothing they needed, and
costing them reproducibility they needed a lot.
Seen in¶
- sources/2024-05-31-dropbox-testing-sync-at-dropbox-2020 — Dropbox's 2024 walkthrough of the testing strategy; canonical reference for Nucleus's designed-for-testability architecture.