CONCEPT Cited by 1 source
Lazy history download¶
Lazy history download is the VCS design stance in which clone
downloads almost no history (commits, trees, files) and instead
pulls data on demand as the user runs queries or operations that
need it.
Contrast with Git's default: git clone downloads all commits
and all file-revisions reachable from the repo's refs. For a 5.5 GB
repo like Linux, "three-quarters" of that is history.
Canonical instance: Sapling¶
"In Sapling, cloning the repository downloads almost no history. Instead, as you use the repository we download just the commits, trees, and files you actually need, which allows you to work with a repository that may be terabytes in size without having to actually download all of it."
— Sapling announcement post (2022-11-15)
Design constraints¶
- Online mode needed for new data. Lazy download implies the user must be online to reach history they haven't touched.
- Offline mode preserved via cache + indexes. Post-quote: "through efficient caching and indexes, we maintain a configurable ability to work offline in many common flows, like making a commit."
- Depends on efficient query primitives. You can't lazily fetch if your queries are O(n) over commits — you'd need all of them. Paired with a bisectable graph-shape index (Segmented Changelog) in Sapling; enables O(log n) or O(number-of-merges) ancestry queries that can run against just the index without fetching commits.
Why it matters¶
Lazy history download is the history-side counterpart to a
virtual file system's
working-copy-side lazy load. Together they make a VCS workable for
monorepos at scale beyond what any public VCS has historically
handled. The two mechanisms are independent — Sapling deploys both;
Git's partial-clone (--filter=blob:none) is an attempt at the
file-data-only variant without the graph-shape side.
Seen in¶
- sources/2024-09-10-meta-sapling-source-control-thats-user-friendly-and-scalable — Sapling's canonical instance, paired with Segmented Changelog for query performance.
Related¶
- systems/sapling-scm — the canonical Sapling instance.
- systems/meta-segmented-changelog — the pairing scale primitive.
- concepts/segmented-changelog — the graph-shape index concept.
- concepts/commit-graph-bisection — the query-performance primitive that makes lazy download viable.
- concepts/monorepo — the scale regime that makes lazy download necessary.
- patterns/lazy-history-on-demand — the broader design pattern.