Skip to content

CONCEPT Cited by 1 source

Commit-graph bisection

Commit-graph bisection is a VCS-side binary-search technique that answers queries like "when did this file's content change?" (log) or "who last touched line L?" (blame) in O(log n) over a segment-indexed commit graph, rather than the traditional O(n) iteration over commits.

Canonical wiki instance: Sapling's Segmented Changelog.

Why it's new

Git's git log <file> and git blame walk the commit graph starting from HEAD and for each commit inspect (a) does this commit touch the file, (b) if so, compare to parent content. That's O(number-of-commits) in the worst case — linear over the whole history. For a monorepo with tens of millions of commits, that's unusable.

The bisection trick

With a segment-indexed graph (Segmented Changelog), you can bisect:

"When running log or blame on a file, we're able to bisect the segment graph to find the history in O(log n) time, instead of O(n), even in Git repositories."

— Sapling announcement post (2022-11-15)

Mechanically: given a segment, ask the server (or a local content index) "does this segment contain a change to file F?" If yes, narrow to half the segment; if no, skip the whole segment in one step. Over the whole graph, you walk O(log n) segments instead of O(n) commits.

Works on Git repos too — the Sapling client applies the bisection algorithmics to Git-backed repos as a client-side optimization. The segments are built locally from the Git graph; the performance win applies even without the Sapling server.

Per-file history graph (server-only)

Sapling goes further on Sapling-server-backed repos: the server "maintain[s] per-file history graphs" that collapse the bisection to a direct lookup — "sl log FILE in less than a second, regardless of how old the file is." This variant does not apply to Git repos; it requires the server-side history-graph index.

Seen in

Last updated · 319 distilled / 1,201 read