CONCEPT Cited by 1 source
Commit-graph bisection¶
Commit-graph bisection is a VCS-side binary-search technique that
answers queries like "when did this file's content change?" (log)
or "who last touched line L?" (blame) in O(log n) over a
segment-indexed commit graph, rather than the traditional O(n)
iteration over commits.
Canonical wiki instance: Sapling's Segmented Changelog.
Why it's new¶
Git's git log <file> and git blame walk the commit graph starting
from HEAD and for each commit inspect (a) does this commit touch the
file, (b) if so, compare to parent content. That's O(number-of-commits)
in the worst case — linear over the whole history. For a monorepo
with tens of millions of commits, that's unusable.
The bisection trick¶
With a segment-indexed graph (Segmented Changelog), you can bisect:
"When running
logorblameon a file, we're able to bisect the segment graph to find the history in O(log n) time, instead of O(n), even in Git repositories."— Sapling announcement post (2022-11-15)
Mechanically: given a segment, ask the server (or a local content index) "does this segment contain a change to file F?" If yes, narrow to half the segment; if no, skip the whole segment in one step. Over the whole graph, you walk O(log n) segments instead of O(n) commits.
Works on Git repos too — the Sapling client applies the bisection algorithmics to Git-backed repos as a client-side optimization. The segments are built locally from the Git graph; the performance win applies even without the Sapling server.
Per-file history graph (server-only)¶
Sapling goes further on Sapling-server-backed repos: the server "maintain[s] per-file history graphs" that collapse the bisection to a direct lookup — "sl log FILE in less than a second, regardless of how old the file is." This variant does not apply to Git repos; it requires the server-side history-graph index.
Seen in¶
- sources/2024-09-10-meta-sapling-source-control-thats-user-friendly-and-scalable — the canonical bisection-over-segment-graph introduction; both the client-side variant (works on Git too) and the server-side per-file history graph variant are described.
Related¶
- systems/sapling-scm — the canonical consumer.
- systems/meta-segmented-changelog — the index structure bisection operates on.
- concepts/segmented-changelog — the algorithmic concept.
- concepts/lazy-history-download — the design pairing; bisection makes lazy download viable by letting queries run on the index without fetching per-commit data.