Skip to content

SYSTEM Cited by 1 source

Dropbox Nucleus

Definition

Nucleus is Dropbox's rewritten sync engine — the process that synchronizes files between the local Dropbox folder on a user's machine and the user's Dropbox on the server. It replaces Sync Engine Classic, the 12+-year evolutionary descendant of the original. Written in Rust. Deployment target: hundreds of millions of machines across every consumer OS Dropbox supports. Shipped ~2020.

The central architectural claim is that Nucleus was designed-for-testability from the start: the data model, client-server protocol, and concurrency model were each shaped specifically to make deterministic-seeded randomized testing frameworks feasible to build on top (systems/canopycheck, systems/trinity).

Data model: three trees of observations

Nucleus persists observations rather than pending work. Three trees, collectively "Canopy":

  • Remote Tree — latest state of the user's Dropbox in the cloud.
  • Local Tree — last observed state of the user's Dropbox on disk.
  • Synced Tree — last known "fully synced" state between Remote and Local. Acts as a per-node merge base.

The Synced Tree is the key innovation: it disambiguates direction of change. If Local matches the Synced merge base but Remote differs, the change was remote. Without the Synced Tree, you couldn't tell "user added foo on the web" from "user deleted foo locally." See concepts/merge-base-three-tree-sync.

Contrast with Classic, which persisted the pending work itself (create here, upload there) and had no merge base. Nucleus derives the sync plan as a pure function of the three trees, which makes the goal cleanly expressible: converge all three trees to the same state.

Node identity: unique IDs, not paths

Nodes in Nucleus are keyed by a unique identifier, not by path. A folder rename is one attribute update in the database + one atomic rename(2) on disk. In Classic, nodes were path-keyed, so renaming a folder with N descendants became O(N) deletes + O(N) adds, both in the database and on disk — transiently exposing two inconsistent subtrees to the user and to downstream components.

Invariant unlockable under ID-keyed representation: a moved folder is visible in exactly one location. This is structurally false under path-keyed representation.

Protocol: invalid states are rejected at the wire

Sync Engine Classic's server-client protocol allowed a client to receive metadata about /baz/cat before receiving /baz. The SQLite schema had to represent orphaned files, and every component processing filesystem metadata had to handle that case. As a result, real "orphaned file" bugs were indistinguishable from acceptable transient states.

Nucleus's protocol reports a critical error on parentless-node metadata before it can enter the client state. The persisted data model then enforces "no node can exist, even transiently, without a parent directory" as an invariant. See concepts/design-away-invalid-states.

Concurrency: one control thread + offload pools

Nearly all Nucleus code runs on a single control thread. Blocking or parallelizable work — network I/O, filesystem I/O, CPU-intensive hashing — is offloaded to dedicated thread pools.

Under test, asynchronous requests are serialized onto the main thread, which means the entire engine runs single-threaded and deterministic from the test framework's point of view. This is the load-bearing substrate beneath systems/canopycheck and systems/trinity: their reproducibility guarantees are impossible without it. See patterns/single-threaded-with-offload.

Classic, by comparison, let components fork threads freely and coordinate via global locks + hard-coded timeouts + backoffs — which made execution order OS-dependent and its tests structurally flaky.

Futures composition

Nucleus is one giant Rust future — impl Future<Output = !> where ! is the uninhabited type (the future can never return; only the OS shutting down the process stops it). Internally it is composed of numerous sub-futures (the "upload worker" wraps a FuturesUnordered of concurrent network-request futures; similar structure for download, notifications, etc.).

Trinity is a custom executor for the top-level Nucleus future: on each iteration of its main loop it calls poll() on Nucleus, poll() on all intercepted mock FS/network requests, and then chooses (driven by the PRNG) to satisfy/fail those requests or to perturb external state. This single-executor-sees-everything property is what makes Trinity's failure injection coverage as broad as it is.

Planner

The planner is the core algorithm: given the three trees, output a batch of concurrent operations that incrementally converges them. Example operations: create this folder on disk, commit this edit to the server. Batching is safety-driven: ops within a batch must be safe to run concurrently (a file cannot be created before its parent directory; two sibling files can edit concurrently).

Planner correctness is the central load-bearing property of the system, which is why systems/canopycheck exists specifically to property-test it.

Determinism requirements

To make (seed, commit-hash) → exact execution trace hold, Nucleus eliminates non-determinism sources that don't materially matter for a sync-engine threat model. Example: Rust's default HashMap uses randomized hashing for DoS resistance against adversaries who can force collisions; Nucleus overrides this with a deterministic hasher, because an adversarial user of Dropbox could only degrade their own client performance via collisions — the DoS-resistance guarantee was paying for nothing they needed, and costing them reproducibility they needed a lot.

Seen in

Last updated · 200 distilled / 1,178 read