Skip to content

PATTERN Cited by 1 source

Lazy history on demand

Scale a VCS to monorepo sizes by refusing to download everything up front — pair a cheap, sublinear-size graph-shape index with on-demand fetching of commits, trees, and files.

Canonical instance: Sapling + Segmented Changelog. See concepts/lazy-history-download for the stance and concepts/segmented-changelog for the graph-shape algorithm.

The two pieces

  1. A bisectable, compact graph-shape index. Downloads in megabytes independent of the repo's commit count. Supports O(number-of-merges) ancestry queries and O(log n) log/blame/file-history queries via bisection.
  2. Lazy fetch of per-commit data. Commit metadata, file contents, and tree objects are fetched on demand from a server as the user's actual workflow (sl status, sl diff, sl log FILE, checkout, etc.) touches them. Cached locally thereafter.

Both pieces are required. The index without lazy fetch is just a cheaper clone. Lazy fetch without the index means every query has to download commits before it can answer — defeats the purpose.

Why it matters

A monorepo with "tens of millions of commits" (Meta's disclosed scale) cannot afford the Git-default model where every clone is O(total-history). And retrospectively: "three-quarters of the 5.5 GB Linux kernel repo is history" — even moderately large public repos pay significantly for upfront history download.

Comparison

Approach What it downloads upfront When it fetches more
Git default clone All commits + all reachable blobs Never (until fetch)
Git --depth=N shallow Last N commits + their blobs Never
Git --filter=blob:none partial clone All commits, no blobs On first checkout of a blob
Git --filter=tree:0 All commits, no trees, no blobs On tree/blob read
Sapling (Segmented Changelog + lazy) Segment graph (MB) On any query that needs per-commit data
Sapling with VFS Segment graph (MB) On any working-copy file read

Sapling is at the extreme-lazy end, but notably the tradeoff is only viable because the index supports sublinear queries — partial clone in Git can still need an O(n) walk for file history.

Seen in

Last updated · 319 distilled / 1,201 read