Meta — Sapling: Source control that's user-friendly and scalable¶
Summary¶
A 2022-11-15 Meta Engineering post (fetched 2024-09-10 into the wiki's raw corpus) announcing the open-sourcing of the Sapling client — the user-facing half of a source-control system Meta has built over the last ten years to serve its internal monorepo at the scale of "tens of millions of files, tens of millions of commits, and tens of millions of branches." The post positions Sapling as two orthogonal innovations that happen to ship together: (1) a usability overhaul of the Git/Mercurial CLI surface (one-command-does-one-thing, no staging area, optional local branch names, a built-in smartlog view, pervasive undo/redo/unhide, first-class commit-stack workflow with absorb / amend --to / restack semantics inspired by Mercurial Evolve); and (2) a scaling architecture for Git-incompatible monorepo sizes — Segmented Changelog (bisectable commit-graph metadata downloaded lazily in megabytes, allowing O(log n) ancestry queries without the full history), lazy history download (clone is ~free; commits/trees/files are fetched on demand; offline mode preserved via cache + indexes), a virtual file system (not yet open-sourced) that makes checkouts fast independent of working-copy size, and organization-owned sparse profiles checked into the repo so thousands of engineers can operate on a shifting subset of the monorepo without per-engineer configuration. The post also introduces ReviewStack — a demonstration stack-oriented code-review UI for GitHub pull requests — and previews the still-proprietary Sapling-compatible server + Commit Cloud features. Importantly, several scale features are unavailable with the open-sourced client alone because they depend on the Sapling-specific server; the open-source client also supports Git repos (clone/fetch/push against any Git host), but does not get the Segmented Changelog / VFS / per-file history-graph benefits in that mode.
Key takeaways¶
- Sapling is 10 years old, not new. Post-quote verbatim: "The release of this post marks my 10th year of working on Sapling at Meta, almost to the day… We've spent the past 10 years building Sapling." Started as a Mercurial extension, grew into its own system with new storage formats, wire protocols, algorithms, and behaviors. Canonical wiki statement that Meta's internal VCS is a Mercurial-lineage codebase — not a Git fork — now open-sourced at sapling-scm.com.
- Monorepo scale that "public source control systems were not, and still are not, capable of handling." Post-quote: Meta's internal repo is "tens of millions of files, tens of millions of commits, and tens of millions of branches." The choice was: break up the repo (lose monorepo benefits — "simplified dependency management and the ability to make broad changes quickly") or scale the VCS. Meta chose to scale the VCS. Canonical wiki counterpoint to Dropbox's 87 GB monorepo on GHEC (sources/2026-03-25-dropbox-reducing-monorepo-size-developer-velocity): at Meta's scale, even a tuned Git on a managed SaaS is off the table; a ground-up VCS is the only viable path.
- Two orthogonal design axes combined in one product. The post's rhetorical structure is explicit about separating (a) "Sapling's user experience" from (b) "Scaling Sapling." Every feature is motivated either by UX or scale, rarely both. This is the clearest statement on the wiki that UX and scalability in source control are independent design dimensions — classical tooling (Git) has traded one for the other at multiple points; Sapling's thesis is that you can invest in both simultaneously. See concepts/vcs-usability for the canonical framing.
- Smartlog is the load-bearing UX primitive. Post-quote verbatim: "By simply running the Sapling client with no arguments,
sl, you can see all your local commits, where you are, where important remote branches are, what files have changed, and which commits are old and have new versions. Equally important, the smartlog hides all the information you don't care about. Remote branches you don't care about are not shown. Thousands of irrelevant commits in main are hidden behind a dashed line." The post explicitly credits smartlog with "gives them the right mental model from day one" and "makes people more confident in using source control." Also surfaces as an interactive web UI viasl weband an interactive undo scroller viasl undo -i(Mac/Linux). Canonical wiki instance of patterns/usability-first-vcs-cli. - Undo is a first-class subsystem, not a manual recovery procedure. "Sapling provides a wide array of tools for understanding what you did and undoing it. Commands like
sl undo,sl redo,sl uncommit, andsl unamendallow you to easily undo many operations. Commands likesl hideandsl unhideallow you to trivially and safely hide commits and bring them back to life." The motivation is explicitly operational: "The Sapling development team is small, and in order to support our tens of thousands of internal developers, we needed to make it as easy as possible to solve your own issues and get unblocked." Post-quote: "Never again should you have to delete your repository and clone again to get things working." Canonical wiki instance of patterns/vcs-undo-tooling as a developer-productivity investment, not an ergonomic luxury. - Commit stacks are a first-class workflow, not a rebase dance. Meta's described pattern: "an engineer building a feature will send out the small first step of that feature as a commit for code review. While it's being reviewed, they will start on the next step as a second commit that will later be sent for code review as well. A full feature will consist of many of these small, incremental, individually reviewed commits on top of one another." Canonical statement on the wiki that stacked diffs are Meta's unit of review. Named Sapling commands that make this work:
sl goto COMMIT,sl amend,sl restack,sl next/sl prev/sl goto top/bottom,sl fold,sl split,sl absorb,sl amend --to COMMIT. Implementation: Sapling tracks mutation history for every commit (Mercurial-Evolve-inspired), so it can algorithmically rebuild the stack after any number of edits. Canonical wiki instance of patterns/first-class-commit-stack-workflow. - ReviewStack: stack-oriented code review. Post-quote: "many external code review tools are optimized for reviewing the entire pull request at once instead of individual commits within the pull request. This makes it hard to have a conversation about individual commits and negates many of the benefits of having a stack of small, incremental, easy-to-understand commits." Meta's answer is ReviewStack — a demonstration review site at reviewstack.dev that presents per-commit conversation+signal on a single page, with dropdown/navigation between commits in a stack. Not framed as a product — framed as a demo that "shows just how intuitive and powerful stacked commit review flows could be." Explicit critique of GitHub pull-request review UX.
- History scale: Segmented Changelog. The core scaling mechanism for commit history. Post-quote: "three-quarters of the 5.5 GB Linux kernel repo is the history." In Sapling,
clone"downloads almost no history." Instead, Sapling downloads "the high-level shape of the commit graph from the server, taking just a few megabytes, and lazily filling in individual commit data later as necessary. This enables querying the graph relationship between any two commits in O(number-of-merges) time, with nothing but the segments and the position of the two commits in the segments." Commands likesl smartlogare "less than a second, regardless of how big the repository is." Commands likelog/blame/sl log FILEuse bisection of the segment graph to get O(log n) instead of O(n) — even when operating on a Git repo. Canonical wiki algorithm: concepts/segmented-changelog + concepts/commit-graph-bisection. - Working-copy scale: Virtual file system + sparse profiles. The VFS is "not yet publicly available" and "makes it look and act as if you have the entire repository. Clones and checkouts become very fast, and while accessing a file for the first time requires a network request, subsequent accesses are fast and prefetching mechanisms can warm the cache for your project." Even without the VFS, Sapling uses Watchman to accelerate
sl status(no full working-copy scan) and supports sparse checkouts. Canonical wiki concept: concepts/virtual-filesystem-for-monorepo — positions Sapling's VFS alongside Microsoft's GVFS and Meta's EdenFS lineage. - Sparse profiles are checked into the repo, not per-developer config. Post-quote verbatim: "Instead of each developer configuring and maintaining their own list of which files should be included, organizations can commit 'sparse profiles' into the repository. When a developer clones the repository, they can choose to enable the sparse profile for their particular product. As the product's dependencies change over time, the sparse profile can be updated by the person changing the dependencies, and every other engineer will automatically receive the new sparse configuration when they checkout or rebase forward." This shifts the sparse-checkout burden from individual engineers to the organization; thousands of engineers "work on a constantly shifting subset of the repository without ever having to think about it." Canonical wiki pattern: patterns/organization-owned-sparse-profile — the architectural move that makes sparse checkout operationally viable at scale.
- Wire-protocol scale: commit cloud preview + incremental monorepo migration. The (not-yet-open-sourced) Sapling-compatible server is a "scalable, distributed source control Rust service we use at Meta to serve Sapling and (soon) Git repositories." Features named as future possibilities: (a) incremental migration into (or out of) the monorepo — "allowing you to experiment with monorepos before committing to them"; (b) Commit Cloud — "all commits in your organization are uploaded as soon as they are made, and sharing code is as simple as sending your colleague a commit hash and having them run
sl goto HASH." Canonical wiki concept: systems/commit-cloud-meta — commit-cloud as a first-class sharing primitive replacing branch-push-and-fetch. - Open-source-today vs. proprietary-for-now posture made explicit. Post-quote verbatim: "Many of our scale features require using a Sapling-specific server and are therefore unavailable in our initial client release. We describe them here as a preview of things to come. When using Sapling with a Git repository, some of these optimizations will not apply." The client is open; the server + VFS are "we hope to open-source" — no commitment. Canonical wiki datapoint: Meta's scale-infrastructure halo has a client-first release cadence — the UX sells the ecosystem; the scale primitives stay proprietary until the ecosystem exists to receive them. Relevant to any VCS/platform-company copying the Sapling approach.
Architectural numbers + operational notes (from source)¶
- Monorepo scale: "tens of millions of files, tens of millions of commits, and tens of millions of branches" — the scale no Git-based public tooling handles. Canonical Meta disclosed figure.
- History-fraction datapoint: "three-quarters of the 5.5 GB Linux kernel repo is the history" — the motivational example for lazy history download. In Sapling: "cloning the repository downloads almost no history."
- Segmented Changelog overhead: "taking just a few megabytes" for the graph-shape download; "O(number-of-merges) time" to answer ancestry queries; "less than a second" for
smartlogat any repo size; "O(log n) in the segment graph" (vs O(n)) forlog/blame— "even in Git repositories". - Working-copy scale: "tens of thousands of internal developers" at Meta using Sapling. "Thousands of engineers" work on shifting sparse subsets via organization-owned profiles.
- Open-source scope: only the client is in the initial release. VFS (virtualfs) and the Sapling-compatible server are "we hope to open-source these in the future" — no commitment. Git interop is built into the client.
- Command surface disclosed:
sl(default = smartlog),sl web,sl undo,sl undo -i,sl redo,sl uncommit,sl unamend,sl hide,sl unhide,sl goto,sl amend,sl restack,sl next,sl prev,sl goto top,sl goto bottom,sl fold,sl split,sl absorb,sl amend --to COMMIT. - Timeline: Sapling project started as Mercurial extension "10 years" before the 2022-11-15 post; diverged from Mercurial with new storage/wire/algorithms; client open-sourced 2022-11-15.
- Platform support for interactive undo scroller: Mac + Linux only (
sl undo -i). - LFS support: Sapling "supports using a Git LFS server" for large files, unchanged semantics.
Systems / hardware extracted¶
New wiki pages:
- systems/sapling-scm — Sapling, Meta's source control system, client open-sourced 2022-11-15 at sapling-scm.com. Mercurial-lineage, 10-year development, scales to tens-of-millions-of-everything.
- systems/reviewstack — demonstration stack-oriented code-review UI for GitHub pull requests at reviewstack.dev.
- systems/sapling-smartlog — the
sldefault view: tailored, concise picture of local commits + relevant remote branches, hiding irrelevant main-branch history behind a dashed line. Also an interactive web UI viasl web. - systems/mercurial — the open-source project Sapling forked from in ~2012. Mercurial Evolve extension cited as direct inspiration for mutation-history tracking.
- systems/watchman — Meta's file-system monitor. Used by Sapling
sl statusto avoid full working-copy scans. - systems/git-lfs — referenced as the large-file-storage protocol Sapling supports for big binary assets.
- systems/meta-segmented-changelog — Sapling's graph-shape index, downloaded in megabytes, enabling O(log n) history queries via bisection of segments.
- systems/sapling-virtual-fs — Sapling's virtual file system (not yet open-sourced). Makes clones/checkouts fast; prefetches per-project; transparently fetches on first access.
- systems/sapling-server — Sapling-compatible server (Rust, not yet open-sourced). Substrate for Segmented Changelog, per-file history graphs, Commit Cloud, and future Sapling-served Git repositories.
- systems/commit-cloud-meta — Meta's commit-cloud feature (preview): every commit in the organization auto-uploaded; sharing via commit hash +
sl goto HASH.
Existing pages reinforced:
- systems/git — extended; Sapling interoperates with any Git repo, but Git-repo mode forgoes the scale optimizations that require the Sapling server. Framing: Sapling is not a Git fork; it's a Mercurial-lineage system with a Git-compatible client surface.
Concepts extracted¶
New wiki pages:
- concepts/lazy-history-download — clone downloads almost no history; commits/trees/files are fetched on demand as the user navigates. Preserves offline mode via cache + indexes.
- concepts/segmented-changelog — the graph-shape representation: high-level merge structure downloaded in megabytes; ancestry queries answered in O(number-of-merges) from segments alone without per-commit data.
- concepts/commit-graph-bisection — bisecting the segment graph to answer
log FILE/blame/ related queries in O(log n) rather than O(n). Works even on Git repos accessed via Sapling. - concepts/virtual-filesystem-for-monorepo — the working-copy scaling primitive: present the full repo shape, fetch file contents on first access, cache subsequently. Sapling's VFS, Microsoft GVFS, Meta EdenFS all canonical instances.
- concepts/sparse-checkout — check out only a declared subset of the repo paths. Existing Git feature; Sapling treats it as a first-class scaling primitive.
- concepts/sparse-profile — the checked-in, organization-owned description of which subset of the repo a particular product team needs. Decouples the sparse-checkout knob from per-engineer config.
- concepts/commit-stack — the workflow unit: an ordered sequence of small incremental commits, each independently reviewable, built on top of each other.
- concepts/mutation-history-commit — tracking the evolution lineage of a commit across amend/rebase/fold/split operations. Enables Sapling to algorithmically rebuild the stack after any number of edits; inspired by Mercurial's Evolve extension.
- concepts/vcs-usability — the canonical wiki framing: usability in a VCS is a first-class, independent design axis orthogonal to scale, and developer productivity is a measurable consequence.
Existing pages reinforced:
- concepts/monorepo — extended; Sapling is Meta's operational answer to the scaling demands of a monorepo at "tens of millions"-of-everything scale. Counterpoint framing: Dropbox's 87 GB repo lives on GHEC with a tuned repack; Meta's much-larger repo requires a dedicated VCS.
Patterns extracted¶
New wiki pages:
- patterns/usability-first-vcs-cli — design the VCS CLI so every command does one thing, defaults are sensible, and the default view (
slalone) is a useful picture of what you care about. Smartlog as canonical instance. - patterns/lazy-history-on-demand — clone ≈ free; download history data as queries need it; pair with a bisectable graph-shape index so ancestry queries stay fast without downloading the data.
- patterns/first-class-commit-stack-workflow — commands + state-tracking that make a stack of small review-sized commits a comfortable default workflow, not a painful rebase dance. Named moves:
absorb,amend --to,restack,fold,split,goto top/bottom. - patterns/vcs-undo-tooling — invest in
undo/redo/uncommit/unamend/hide/unhideas first-class commands, not edge-case recovery. Interactive undo-scroller as the recovery-UX primitive. - patterns/organization-owned-sparse-profile — check the sparse-checkout configuration into the repo as a named profile; each product team owns its profile; engineers opt in by profile name. Dependency changes update the profile; engineers pick up new sparse state on next checkout/rebase.
Caveats¶
- Client-only open-source disclosure: the VFS + Sapling server + Commit Cloud + per-file history graphs are not in the initial open-source release. The post explicitly flags this: "these scale features require using a Sapling-specific server and are therefore unavailable in our initial client release." Anyone evaluating Sapling from the open-source client alone gets the UX, the Git-compatibility layer, and some of the Segmented Changelog benefits — but not the Meta-scale checkout/workspace performance.
- No disclosed scale numbers below the tens-of-millions framing: no commit-graph sizes, no per-file-history-graph memory footprints, no
sl smartloglatency percentiles under load, no VFS prefetch-hit-rate numbers, no Watchman scan-avoidance throughput, no per-commit-cloud storage/traffic numbers. The architectural narrative is substance-dense; the operational telemetry is not. - Narrative voice / self-retrospective: the post is written by a Sapling engineer marking the 10-year anniversary of joining the project and the open-sourcing milestone. It does not enumerate trade-offs or present-known-issues; the limits of the approach (what breaks, what fails to scale further, what problems remain) are absent.
- Mercurial lineage is acknowledged but not deeply explored: the post thanks "the Mercurial open source community for all their collaboration and inspiration in the early days of Sapling" without discussing which Mercurial properties were kept vs replaced. Readers familiar with Mercurial will have more context; readers coming from Git will need to infer.
- ReviewStack is a demo, not a product: the post is explicit — "a demonstration website" — and doesn't commit to ReviewStack being a shipping product. The critique of GitHub PR UX is more durable than the ReviewStack endpoint.
- No Git-vs-Sapling feature-parity matrix: the post says some Sapling optimizations don't apply on Git-backed repos but doesn't enumerate which. Operator framing is: try
slon your existing Git repo for the UX; migrate to Sapling-server for the scale.
Source¶
- Original: https://engineering.fb.com/2022/11/15/open-source/sapling-source-control-scalable/
- Raw markdown:
raw/meta/2024-09-10-sapling-source-control-thats-user-friendly-and-scalable-2022-bf170035.md
Related¶
- companies/meta — the company + the Tier-1 source.
- sources/2026-03-25-dropbox-reducing-monorepo-size-developer-velocity — the counterpoint: how a smaller monorepo (87 GB) survives on GHEC by tuning Git itself, at scales Sapling would sit above.
- concepts/monorepo — the architectural choice Sapling exists to support.
- systems/sapling-scm — the system.
- systems/git — the ecosystem Sapling interoperates with but does not replace.
- patterns/first-class-commit-stack-workflow — the load-bearing UX pattern of Sapling.
- patterns/usability-first-vcs-cli — the load-bearing design-stance pattern.