Skip to content

CONCEPT Cited by 1 source

Merge-Base Three-Tree Sync

Definition

Merge-base three-tree sync is a file-sync data model that persists three observed tree states — Remote, Local, and a merge base called Synced — rather than persisting a list of pending operations. Correct sync behavior is derived as a pure function of the three trees rather than stored. The name "three trees" and the merge-base framing are borrowed from version-control systems (Git's three-way merge: HEAD, merge-base, other-branch).

Canonical realization: Dropbox Nucleus's "Canopy" data model. Contrast: the legacy Sync Engine Classic persisted the pending work itself (create-here, upload-there).

The three trees

  • Remote Tree — latest state of the user's Dropbox in the cloud, as the client last observed it.
  • Local Tree — last observed state of the user's Dropbox on disk, as the client last observed it.
  • Synced Tree — last state at which Remote and Local were known equal, per-node. Each node in Synced acts as the merge base for the corresponding nodes in Remote and Local.

Sync goal (cleanly expressible): converge all three trees to the same state. When a user's disk looks like dropbox.com (Local == Remote), sync is complete.

Why the merge base matters

The Synced Tree disambiguates direction of change. Consider:

Scenario: user sees /foo/fum on disk (Local Tree has it) but it's missing from Remote. Was it deleted remotely? Or added locally? Without a merge base, both are consistent with the observation. With a merge base:

  • If Synced Tree has /foo/fum and matches Local → remote deletion (Remote diverged from the merge base, Local didn't).
  • If Synced Tree doesn't have /foo/fum but Local does → local addition (Local diverged from the merge base, Remote didn't).

In either case the correct action differs (delete locally vs. upload), and both are derivable from the three-tree observation with no additional state.

Why observations, not work

Classic's work-oriented model (store "this file needs uploading" in the database) forced the engine to eagerly decide the sync direction at observation time. Any mistake at that point corrupted user data. The observations model defers the decision to the planner, which can recompute from the authoritative tree state at any point — so transient mistakes in observation cost at most a wasted planning cycle, not data loss.

Observations also compose better under concurrent edits. A remote change that arrives while a local change is in progress simply updates the Remote Tree; the planner sees both deltas against the same merge base and computes the correct resolution.

Enables strong testable invariants

  • "All three trees must converge" — a single sufficient equality assertion for sync-complete.
  • Asymmetric invariants: "a file only on Remote at init must be on all three trees at end" (catches the degenerate "planner-deletes-everything" attack on the equality invariant).
  • Per-node invariants: "no node exists without a parent directory" (only enforceable after designing away the orphan state).

These invariants are what make CanopyCheck's property-based testing work: the planner can be driven from any randomly generated triple of trees and its correctness is a small set of universal properties.

Relation to version control

The borrow is explicit: in Git, a three-way merge uses (merge-base, HEAD, other) to determine per-file how to reconcile. Nucleus's model is a per-node version: the Synced Tree's node for /foo/fum is the merge base for Remote and Local's node with the same ID. The action analogous to "choose HEAD's change" is "apply Local's change"; "choose other's change" is "apply Remote's change"; "both changed" is the conflict case needing user intervention.

Relation to CRDT-style sync

CRDTs compute the merge locally in a data-type-specific way, without a merge base. Dropbox's file semantics (arbitrary bytes, renames, moves) don't admit a CRDT resolution — two concurrent edits to the same file are genuinely conflicting and require either a conflict file or user arbitration. The merge-base model is what lets Nucleus detect the conflict cleanly (both Local and Remote diverge from Synced → conflict) rather than silently picking one winner.

Seen in

Last updated · 200 distilled / 1,178 read