Skip to content

CONCEPT Cited by 1 source

Lazy history download

Lazy history download is the VCS design stance in which clone downloads almost no history (commits, trees, files) and instead pulls data on demand as the user runs queries or operations that need it.

Contrast with Git's default: git clone downloads all commits and all file-revisions reachable from the repo's refs. For a 5.5 GB repo like Linux, "three-quarters" of that is history.

Canonical instance: Sapling

"In Sapling, cloning the repository downloads almost no history. Instead, as you use the repository we download just the commits, trees, and files you actually need, which allows you to work with a repository that may be terabytes in size without having to actually download all of it."

— Sapling announcement post (2022-11-15)

Design constraints

  • Online mode needed for new data. Lazy download implies the user must be online to reach history they haven't touched.
  • Offline mode preserved via cache + indexes. Post-quote: "through efficient caching and indexes, we maintain a configurable ability to work offline in many common flows, like making a commit."
  • Depends on efficient query primitives. You can't lazily fetch if your queries are O(n) over commits — you'd need all of them. Paired with a bisectable graph-shape index (Segmented Changelog) in Sapling; enables O(log n) or O(number-of-merges) ancestry queries that can run against just the index without fetching commits.

Why it matters

Lazy history download is the history-side counterpart to a virtual file system's working-copy-side lazy load. Together they make a VCS workable for monorepos at scale beyond what any public VCS has historically handled. The two mechanisms are independent — Sapling deploys both; Git's partial-clone (--filter=blob:none) is an attempt at the file-data-only variant without the graph-shape side.

Seen in

Last updated · 319 distilled / 1,201 read