PATTERN Cited by 1 source
Lazy history on demand¶
Scale a VCS to monorepo sizes by refusing to download everything up front — pair a cheap, sublinear-size graph-shape index with on-demand fetching of commits, trees, and files.
Canonical instance: Sapling + Segmented Changelog. See concepts/lazy-history-download for the stance and concepts/segmented-changelog for the graph-shape algorithm.
The two pieces¶
- A bisectable, compact graph-shape index. Downloads in
megabytes independent of the repo's commit count. Supports
O(number-of-merges) ancestry queries and O(log n)
log/blame/file-history queries via bisection. - Lazy fetch of per-commit data. Commit metadata, file contents,
and tree objects are fetched on demand from a server as the user's
actual workflow (
sl status,sl diff,sl log FILE, checkout, etc.) touches them. Cached locally thereafter.
Both pieces are required. The index without lazy fetch is just a cheaper clone. Lazy fetch without the index means every query has to download commits before it can answer — defeats the purpose.
Why it matters¶
A monorepo with "tens of millions of commits" (Meta's disclosed scale) cannot afford the Git-default model where every clone is O(total-history). And retrospectively: "three-quarters of the 5.5 GB Linux kernel repo is history" — even moderately large public repos pay significantly for upfront history download.
Comparison¶
| Approach | What it downloads upfront | When it fetches more |
|---|---|---|
| Git default clone | All commits + all reachable blobs | Never (until fetch) |
Git --depth=N shallow |
Last N commits + their blobs | Never |
Git --filter=blob:none partial clone |
All commits, no blobs | On first checkout of a blob |
Git --filter=tree:0 |
All commits, no trees, no blobs | On tree/blob read |
| Sapling (Segmented Changelog + lazy) | Segment graph (MB) | On any query that needs per-commit data |
| Sapling with VFS | Segment graph (MB) | On any working-copy file read |
Sapling is at the extreme-lazy end, but notably the tradeoff is only viable because the index supports sublinear queries — partial clone in Git can still need an O(n) walk for file history.
Related pattern family¶
- patterns/blobless-clone-lazy-hydrate — the pattern-space
cousin: Git
--filter=blob:none+ client-side background hydration. Cloudflare Artifacts / ArtifactFS canonical instance. - concepts/virtual-filesystem-for-monorepo — the working-copy- side analog.
Seen in¶
- sources/2024-09-10-meta-sapling-source-control-thats-user-friendly-and-scalable — canonical Sapling instance; Segmented Changelog + lazy fetch described as the mechanism that makes monorepo VCS viable.
Related¶
- systems/sapling-scm — the canonical instance.
- systems/meta-segmented-changelog — the index-side mechanism.
- concepts/lazy-history-download — the design-stance concept.
- concepts/segmented-changelog — the graph-shape algorithm.
- concepts/commit-graph-bisection — the query-performance primitive.
- concepts/virtual-filesystem-for-monorepo — the working-copy analog.
- patterns/blobless-clone-lazy-hydrate — the pattern-family cousin.