CONCEPT Cited by 1 source
Segmented Changelog (algorithm)¶
Segmented Changelog is an algorithmic representation of a VCS commit graph that partitions commits into maximal linear segments at merge points and stores the inter-segment merge structure as a compact index. The representation enables fast ancestry and log queries without iterating over all commits.
Originated in Meta's Sapling source control system; canonical wiki reference: systems/meta-segmented-changelog.
Core properties¶
Graph shape is cheap. Per the 2022-11-15 Sapling post, the graph-shape index downloads in "just a few megabytes" regardless of the repo's commit count.
Ancestry query is O(number-of-merges). With only the segment index and two commit positions:
"This enables querying the graph relationship between any two commits in O(number-of-merges) time, with nothing but the segments and the position of the two commits in the segments."
— Sapling announcement post
Log / blame is O(log n) via bisection. Segmented Changelog bisects the segment graph for file-history queries — see concepts/commit-graph-bisection.
Why segments¶
The classical graph representation stores every commit with edges to its parent(s) and treats the whole thing uniformly. For most real-world repos, the commit graph is sparse in merges — long linear chains punctuated by occasional merges. Segmented Changelog exploits this: each maximal linear run is a single "segment" in the index; only merges create new segment boundaries. Queries become operations on the (much smaller) segment graph.
Complexity: O(number-of-merges) is a dramatic improvement over
O(number-of-commits) when n_merges ≪ n_commits, which is the typical
monorepo case.
Pairing with lazy history¶
Segmented Changelog alone doesn't give you scale — you still need the per-commit data (metadata, trees, files) to do most operations. The pairing is lazy history download: download the segment index up front (cheap), fetch per-commit data on demand as queries need it. Together they enable monorepo-scale VCS without the upfront cost of downloading everything.
Seen in¶
- sources/2024-09-10-meta-sapling-source-control-thats-user-friendly-and-scalable — canonical algorithmic introduction; paired with lazy history download as Sapling's history-scaling primitive.
Related¶
- systems/meta-segmented-changelog — the implemented system.
- systems/sapling-scm — the parent VCS.
- concepts/lazy-history-download — the pairing stance.
- concepts/commit-graph-bisection — the query primitive built on the segment index.
- patterns/lazy-history-on-demand — the broader pattern.