Skip to content

SYSTEM Cited by 1 source

Segmented Changelog (Meta / Sapling)

Segmented Changelog is Sapling's commit- graph representation and query engine. It is the specific mechanism that lets Sapling answer common history queries (smartlog rendering, ancestry, blame, per-file history) without downloading the full commit history of the repository.

The problem it solves

A monorepo at Meta's scale has "tens of millions of commits." Even the Linux kernel repo is "three-quarters history" at 5.5 GB. Any VCS operation that has to walk millions of commits — or download them first — is not viable at monorepo scale. Git's reference is git log, which is O(n) over commits.

What Segmented Changelog downloads

Per the 2022-11-15 Sapling announcement post:

"To solve this, we developed the Segmented Changelog, which allows the downloading of the high-level shape of the commit graph from the server, taking just a few megabytes, and lazily filling in individual commit data later as necessary."

— Sapling announcement post

Two-layer design:

  1. Graph-shape index ("just a few megabytes"): the commit graph's merge/linear-segment structure downloaded once. This is the segmented-changelog representation proper — commits partitioned into maximal linear segments at merge points.
  2. Per-commit data (commit metadata, trees, files): downloaded lazily as queries and user operations need them. Part of Sapling's broader lazy history download strategy.

Query performance

With only the segments and two commit positions:

  • Ancestry query between any two commits: O(number-of-merges).
  • sl smartlog: "less than a second, regardless of how big the repository is."
  • log / blame: bisection of the segment graph gets O(log n) instead of O(n). Post-quote: this "even in Git repositories" — the Sapling client applies Segmented Changelog algorithmics to Git-backed repos too (as a client-side optimization, with the segments built from the Git graph).
  • sl log FILE (per-file history): "less than a second, regardless of how old the file is" — but this variant "maintains per-file history graphs" that require the Sapling server and do not apply on Git-backed repos.

Open-source availability

Segmented Changelog is implemented in the open-source Sapling client. The Sapling-server-side variants (per-file history graphs, fully server-backed lazy download) require the not-yet-open-sourced Sapling-compatible server.

Seen in

Last updated · 319 distilled / 1,201 read