SYSTEM Cited by 1 source
Segmented Changelog (Meta / Sapling)¶
Segmented Changelog is Sapling's commit- graph representation and query engine. It is the specific mechanism that lets Sapling answer common history queries (smartlog rendering, ancestry, blame, per-file history) without downloading the full commit history of the repository.
The problem it solves¶
A monorepo at Meta's scale has "tens of millions of commits." Even
the Linux kernel repo is "three-quarters history" at 5.5 GB. Any
VCS operation that has to walk millions of commits — or download
them first — is not viable at monorepo scale. Git's reference is
git log, which is O(n) over commits.
What Segmented Changelog downloads¶
Per the 2022-11-15 Sapling announcement post:
"To solve this, we developed the Segmented Changelog, which allows the downloading of the high-level shape of the commit graph from the server, taking just a few megabytes, and lazily filling in individual commit data later as necessary."
— Sapling announcement post
Two-layer design:
- Graph-shape index ("just a few megabytes"): the commit graph's merge/linear-segment structure downloaded once. This is the segmented-changelog representation proper — commits partitioned into maximal linear segments at merge points.
- Per-commit data (commit metadata, trees, files): downloaded lazily as queries and user operations need them. Part of Sapling's broader lazy history download strategy.
Query performance¶
With only the segments and two commit positions:
- Ancestry query between any two commits: O(number-of-merges).
sl smartlog: "less than a second, regardless of how big the repository is."log/blame: bisection of the segment graph gets O(log n) instead of O(n). Post-quote: this "even in Git repositories" — the Sapling client applies Segmented Changelog algorithmics to Git-backed repos too (as a client-side optimization, with the segments built from the Git graph).sl log FILE(per-file history): "less than a second, regardless of how old the file is" — but this variant "maintains per-file history graphs" that require the Sapling server and do not apply on Git-backed repos.
Open-source availability¶
Segmented Changelog is implemented in the open-source Sapling client. The Sapling-server-side variants (per-file history graphs, fully server-backed lazy download) require the not-yet-open-sourced Sapling-compatible server.
Seen in¶
- sources/2024-09-10-meta-sapling-source-control-thats-user-friendly-and-scalable — canonical-first wiki introduction; positions Segmented Changelog as the history-scaling primitive underneath Sapling's smartlog / log / blame performance claims.
Related¶
- systems/sapling-scm — the parent system.
- systems/sapling-server — the not-yet-open-source substrate for the per-file-history-graph variant.
- concepts/segmented-changelog — the algorithmic concept.
- concepts/commit-graph-bisection — the O(log n) query primitive.
- concepts/lazy-history-download — the pairing concept.
- patterns/lazy-history-on-demand — the broader design pattern.