Skip to content

SYSTEM Cited by 4 sources

Git

Git is the dominant distributed version control system (origin: Linus Torvalds, 2005). Data model: a content-addressed object store (blobs / trees / commits / tags keyed by SHA-1/SHA-256) rooted in .git/objects/, plus refs (branches, tags) pointing at commit objects.

Runtime concerns that matter at scale in the wiki:

  • Storage layout. Loose objects first (one file per object), then repacked into pack files (.pack + .idx) for compact on-disk representation; pack files are where delta compression happens.
  • Transfer. git clone / git fetch / git push move pack files, not loose objects; a server typically rebuilds the transfer pack dynamically per request from what the client already has (the source of why local repack improvements don't survive a server-mediated round trip — see systems/github).
  • Repack. git repack -adf (-a all-in-one, -d delete replaced, -f refuse delta reuse) + --window=N (how many nearby objects to consider as delta candidates) + --depth=N (max delta chain length) are the knobs that make aggressive compression possible; defaults are tuned for average repos, not for pathological structural mismatches.
  • Pluggable signing. gpg.format=openpgp|ssh|x509 + gpg.x509.program=<signer> let organisations plug GPG / SSH / or S/MIME (X.509) signers into the commit-signing path — see concepts/commit-signing and the Figma device-trust system.

Defaults that matter

  • Delta pairing heuristic: Git pairs files for delta compression using only the last 16 characters of the file path. Works for typical codebases where similar names usually mean similar content; can go pathological when the distinguishing portion of the path falls outside those 16 characters (Dropbox i18n canonical instance — see concepts/git-delta-compression and sources/2026-03-25-dropbox-reducing-monorepo-size-developer-velocity).
  • Repack window / depth defaults (--window=10 --depth=50 on git gc) are conservative; tuned values like --window=250 --depth=250 trade time for compression ratio.
  • --path-walk flag: experimental option that walks the full directory tree for delta-candidate selection instead of using the 16-char heuristic; effective on structurally-mismatched repos but incompatible with GitHub's server-side bitmap / delta-island optimizations.

Stub page

This page exists because Git internals became load-bearing in a wiki source and we need a canonical anchor. Most Git behaviour is out of scope here; see concepts/git-pack-file and concepts/git-delta-compression for the aspects the wiki covers.

Seen in

  • sources/2026-03-25-dropbox-reducing-monorepo-size-developer-velocity — Dropbox traces 87 GB monorepo growth to Git's 16-char path heuristic pairing .po files across languages; fixes via a tuned server-side repack at 84 GB → 20 GB.
  • sources/2025-09-15-github-post-quantum-security-for-ssh-access-on-github — Git-over-SSH is the transport getting PQ-KEX (sntrup761x25519-sha512) on github.com's SSH endpoints as of 2025-09-17. Git-over-HTTPS unaffected; independent crypto-agility timelines per transport.
  • sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git — Git's smart-HTTP protocol + object graph reused as the wire protocol of a generic versioned-storage primitive (Cloudflare Artifacts). Canonical wiki instance of patterns/git-protocol-as-api: Git chosen specifically because "it's deep in the training data of most models" and sidesteps the bootstrap problem of a new protocol. Server is a ~100 KB pure-Zig Wasm Git server running inside Durable Objects — first wiki instance of a complete Git server reimplemented in Wasm (SHA-1, zlib, delta encode/decode, pack parsing, smart-HTTP v1+v2 from scratch). Artifacts also makes .import() + .fork() first-class alongside clone / fetch / push and natively supports git-notes for agent-written metadata. Sibling ArtifactFS exploits Git's partial-clone (--filter=blob:none) machinery for blobless-clone + async-hydration on any Git remote.
  • sources/2024-09-10-meta-sapling-source-control-thats-user-friendly-and-scalable — Meta positions Sapling as not a Git fork but a Mercurial-lineage scalable VCS whose open-source client also speaks Git. Canonical wiki framing of the scale threshold where Git stops being viable: Meta's internal monorepo is "tens of millions of files, tens of millions of commits, and tens of millions of branches" — a regime Meta explicitly describes as one "public source control systems were not, and still are not, capable of handling." Sapling's client-side segment-graph bisection accelerates log/blame on Git repos "even in Git repositories"; the deeper scale primitives (server-side Segmented Changelog per-file history graphs, VFS, Commit Cloud) require Sapling's own server and do not apply when Sapling is pointed at a Git host.

  • Git cherry-pick + conflict-resolution as fork-sync infrastructure. Manan Gupta documents PlanetScale's evolution of Git-primitive-driven automation for keeping their private Vitess fork aligned with OSS upstream: (1) weekly whole-diff git cherry-pick on GitHub Actions cron; (2) git-replay, a custom tool that memoises conflict resolutions and replays them on subsequent passes across multiple release branches (concepts/conflict-resolution-memoization); (3) the Vitess cherry-pick bot that drives continuous PR-level cherry-pick between branch-pair mirrors (OSS main ↔ private upstream, OSS release-x.0 ↔ private latest-x.0) with draft-PR escalation on conflict. First wiki disclosure of Git's cherry-pick primitive under the concepts/fork-upstream-sync framing and of conflict-resolution-memoisation as an axis distinct from git rerere.

  • Git three-way merge as a portable mental model for schema change coordination. Shlomi Noach explicitly borrows Git's three-way-merge terminology (merge base, two branches) for PlanetScale's schema conflict detector, but the implementation is entirely different: Git merges text lines, while the schema version operates on semantic SQL diffs (ALTER TABLE, CREATE TABLE) computed by schemadiff and tests for commutativity under function composition. Two structural analogies are preserved: (1) a merge base (main at fork time) is required; (2) end-of-file conflicts in Git correspond to column-order conflicts in schema (both branches "append different rows to the end"). First wiki instance of Git terminology borrowed for a non-source-code domain.

Last updated · 542 distilled / 1,571 read