SYSTEM Cited by 4 sources
Git¶
Git is the dominant distributed version control system (origin:
Linus Torvalds, 2005). Data model: a content-addressed object store
(blobs / trees / commits / tags keyed by SHA-1/SHA-256) rooted in
.git/objects/, plus refs (branches, tags) pointing at commit
objects.
Runtime concerns that matter at scale in the wiki:
- Storage layout. Loose objects first (one file per object), then
repacked into pack files (
.pack+.idx) for compact on-disk representation; pack files are where delta compression happens. - Transfer.
git clone/git fetch/git pushmove pack files, not loose objects; a server typically rebuilds the transfer pack dynamically per request from what the client already has (the source of why local repack improvements don't survive a server-mediated round trip — see systems/github). - Repack.
git repack -adf(-aall-in-one,-ddelete replaced,-frefuse delta reuse) +--window=N(how many nearby objects to consider as delta candidates) +--depth=N(max delta chain length) are the knobs that make aggressive compression possible; defaults are tuned for average repos, not for pathological structural mismatches. - Pluggable signing.
gpg.format=openpgp|ssh|x509+gpg.x509.program=<signer>let organisations plug GPG / SSH / or S/MIME (X.509) signers into the commit-signing path — see concepts/commit-signing and the Figma device-trust system.
Defaults that matter¶
- Delta pairing heuristic: Git pairs files for delta compression using only the last 16 characters of the file path. Works for typical codebases where similar names usually mean similar content; can go pathological when the distinguishing portion of the path falls outside those 16 characters (Dropbox i18n canonical instance — see concepts/git-delta-compression and sources/2026-03-25-dropbox-reducing-monorepo-size-developer-velocity).
- Repack window / depth defaults (
--window=10 --depth=50ongit gc) are conservative; tuned values like--window=250 --depth=250trade time for compression ratio. --path-walkflag: experimental option that walks the full directory tree for delta-candidate selection instead of using the 16-char heuristic; effective on structurally-mismatched repos but incompatible with GitHub's server-side bitmap / delta-island optimizations.
Stub page¶
This page exists because Git internals became load-bearing in a wiki source and we need a canonical anchor. Most Git behaviour is out of scope here; see concepts/git-pack-file and concepts/git-delta-compression for the aspects the wiki covers.
Seen in¶
- sources/2026-03-25-dropbox-reducing-monorepo-size-developer-velocity
— Dropbox traces 87 GB monorepo growth to Git's 16-char path
heuristic pairing
.pofiles across languages; fixes via a tuned server-side repack at 84 GB → 20 GB. - sources/2025-09-15-github-post-quantum-security-for-ssh-access-on-github
— Git-over-SSH is the transport getting PQ-KEX
(
sntrup761x25519-sha512) on github.com's SSH endpoints as of 2025-09-17. Git-over-HTTPS unaffected; independent crypto-agility timelines per transport. - sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git
— Git's smart-HTTP protocol + object graph reused as the wire
protocol of a generic versioned-storage primitive
(Cloudflare Artifacts). Canonical
wiki instance of patterns/git-protocol-as-api: Git chosen
specifically because "it's deep in the training data of most
models" and sidesteps the bootstrap problem of a new protocol.
Server is a ~100 KB pure-Zig Wasm Git
server running inside
Durable Objects — first wiki instance of a complete Git server
reimplemented in Wasm (SHA-1, zlib, delta encode/decode, pack
parsing, smart-HTTP v1+v2 from scratch). Artifacts also makes
.import()+.fork()first-class alongsideclone/fetch/pushand natively supports git-notes for agent-written metadata. Sibling ArtifactFS exploits Git's partial-clone (--filter=blob:none) machinery for blobless-clone + async-hydration on any Git remote. -
sources/2024-09-10-meta-sapling-source-control-thats-user-friendly-and-scalable — Meta positions Sapling as not a Git fork but a Mercurial-lineage scalable VCS whose open-source client also speaks Git. Canonical wiki framing of the scale threshold where Git stops being viable: Meta's internal monorepo is "tens of millions of files, tens of millions of commits, and tens of millions of branches" — a regime Meta explicitly describes as one "public source control systems were not, and still are not, capable of handling." Sapling's client-side segment-graph bisection accelerates
log/blameon Git repos "even in Git repositories"; the deeper scale primitives (server-side Segmented Changelog per-file history graphs, VFS, Commit Cloud) require Sapling's own server and do not apply when Sapling is pointed at a Git host. -
— Git cherry-pick + conflict-resolution as fork-sync infrastructure. Manan Gupta documents PlanetScale's evolution of Git-primitive-driven automation for keeping their private Vitess fork aligned with OSS upstream: (1) weekly whole-diff
git cherry-pickon GitHub Actions cron; (2) git-replay, a custom tool that memoises conflict resolutions and replays them on subsequent passes across multiple release branches (concepts/conflict-resolution-memoization); (3) the Vitess cherry-pick bot that drives continuous PR-level cherry-pick between branch-pair mirrors (OSSmain↔ privateupstream, OSSrelease-x.0↔ privatelatest-x.0) with draft-PR escalation on conflict. First wiki disclosure of Git's cherry-pick primitive under the concepts/fork-upstream-sync framing and of conflict-resolution-memoisation as an axis distinct fromgit rerere. -
— Git three-way merge as a portable mental model for schema change coordination. Shlomi Noach explicitly borrows Git's three-way-merge terminology (merge base, two branches) for PlanetScale's schema conflict detector, but the implementation is entirely different: Git merges text lines, while the schema version operates on semantic SQL diffs (
ALTER TABLE,CREATE TABLE) computed byschemadiffand tests for commutativity under function composition. Two structural analogies are preserved: (1) a merge base (mainat fork time) is required; (2) end-of-file conflicts in Git correspond to column-order conflicts in schema (both branches "append different rows to the end"). First wiki instance of Git terminology borrowed for a non-source-code domain.
Related¶
- concepts/git-pack-file — Git's compressed object format.
- concepts/git-delta-compression — the 16-char heuristic failure mode.
- concepts/fork-upstream-sync — OSS ↔ private-fork sync
problem riding on
git cherry-pick. - concepts/conflict-resolution-memoization —
git-replay- style reuse of past conflict resolutions. - patterns/server-side-git-repack — tuned repack as the platform-compatible fix shape.
- concepts/commit-signing — pluggable signer interface.
- concepts/wasm-git-server — Cloudflare Artifacts' ~100 KB pure-Zig Wasm reimplementation of the Git smart-HTTP server.
- concepts/async-clone-hydration — ArtifactFS' blobless-clone + background-hydration treatment of Git's partial-clone machinery.
- concepts/repo-per-agent-session — Git-repo-as-agent-state substrate (Cloudflare internal-dogfood pattern on Artifacts).
- patterns/git-protocol-as-api — Git as the wire protocol of an agent-first storage primitive.
- patterns/blobless-clone-lazy-hydrate — startup-latency pattern for large-repo agent sandboxes.
- patterns/do-backed-git-server — per-repo actor + Wasm engine substrate.
- patterns/automated-upstream-cherry-pick-bot, patterns/draft-pr-for-conflicts — cherry-pick automation patterns built on top of Git.
- systems/cloudflare-artifacts — canonical instance of Git-protocol-as-API for an agent-first storage primitive.
- systems/artifact-fs — client-side FS driver for blobless-clone + lazy hydration.
- systems/github — the managed-SaaS layer on top of Git where server-side pack construction lives.
- systems/github-actions — common compute substrate for Git-driven automation.
- systems/git-replay — PlanetScale's pre-bot tool for cherry-pick replay with conflict-resolution memoisation.
- systems/vitess-cherry-pick-bot — continuous-flow successor.
- systems/sapling-scm — Meta's Mercurial-lineage VCS whose open-source client also speaks Git (but without Sapling's server-side scale primitives in that mode).
- systems/mercurial — Git's DVCS contemporary; Sapling's ancestor.