SYSTEM Cited by 1 source

CanopyCheck¶

Definition¶

CanopyCheck is Dropbox's narrowly-scoped randomized property-based testing framework for the Nucleus planner. Named after QuickCheck, the Haskell testing library it's directly inspired by; the "Canopy" half is Dropbox's internal term for the Nucleus three-tree data model (Remote + Local + Synced together).

The narrow scope is the point: CanopyCheck tests the planner only, skipping I/O and concurrency in exchange for (a) stronger invariants, (b) easy minimization of failing test cases, (c) much denser coverage per CPU-second than full-system testing. The concurrency / E2E coverage is Trinity's job.

Architecture¶

Input generation. Generates one random tree, then perturbs it twice to get the other two trees of Canopy. Independent random generation would yield disjoint-path trees, on which the planner would never exercise its delete/edit/move logic — so correlation between the trees is a coverage requirement, not a simplification.

Execution loop: 1. Ask the planner for a batch of concurrent operations. 2. Randomly shuffle the batch (verifies that order within a batch doesn't matter). 3. For each operation, pretend it succeeded — update the trees to reflect the operation's intended outcome. No I/O, no concurrency. 4. Repeat from step 1 until the planner returns no further operations.

Because step 3 is a no-op-on-the-world, CanopyCheck skips all the "juicy" parts of a real sync (mock FS, mock network, real timers) and exists purely to stress the planner's ability to derive correct operations from any tree configuration.

Invariants enforced¶

Termination — heuristic cutoff at 200 planning iterations; catches accidental infinite loops (planner should always make progress).
No panics — Nucleus's codebase is liberal with assert!; CanopyCheck is often the first discovery surface for flawed assumptions. Historically caught the Archives/Drafts/January cycle bug: local move + remote move → cycle → tree-data-structure assert fires.
Sync correctness (equality at end) — all three trees equal when the planner stops emitting ops. Necessary but not sufficient: a planner that deletes everything also satisfies it.
Asymmetric correctness invariants — e.g. "if a file exists only on Remote at init, it must exist in all three trees at the end" (upload analogue symmetrically enforced), "a locally added file that wasn't moved into an online-only folder must remain downloaded." These catch the delete-everything degenerate attack on the equality invariant.

Because CanopyCheck's invariants are enforced on random inputs, they must be simple enough to be universally true yet aggressive enough to be consequential. The trick: derive some property at init, enforce a related property at end.

Minimization (shrinking)¶

When a seed fails, CanopyCheck iteratively removes nodes from the initial trees and re-runs to check whether the failure persists. The minimal reproducer is often dramatically smaller than the randomly-generated one. A typical random test generates trees with dozens of nodes across three roots; the underlying bug is often expressible on 3-5 nodes — minimization makes that visible at a glance ("oh, we're not handling the case where the user adds a node under a parent directory that was moved remotely"). See concepts/test-case-minimization.

Minimization works for CanopyCheck because (a) input shape is simple and typed (three trees, nothing else); (b) removing a node rarely invalidates other failures in a way that diverges the execution totally. This is what Trinity cannot do: perturbing a Trinity test's initial state changes the RPC scheduling downstream, invalidating the seed.

Volume¶

Tens of millions of random runs per night across all Nucleus random testing systems — CanopyCheck is the denser contributor because it doesn't pay I/O / mock-setup cost. "In general, 100% green on the latest master." On regression, CI auto-creates a tracking task with (seed, commit-hash).

Seen in¶

sources/2024-05-31-dropbox-testing-sync-at-dropbox-2020 — canonical description of CanopyCheck's architecture, invariants, and minimization approach.