CONCEPT Cited by 2 sources
Git pack file¶
A Git pack file is Git's on-disk compressed
bundle of objects. Rather than keep every blob / tree / commit as a
loose file under .git/objects/, Git periodically packs many
objects together into .pack (data) + .idx (index) pairs under
.git/objects/pack/. Pack files are where Git's storage efficiency
actually lives.
What a pack file contains¶
- Object payloads, each stored either as a full compressed
object or as a delta against a nearby base object (the base
may itself be a delta — up to
--depthchained deltas deep). - Index (
.idx) mapping object SHA → byte offset in the pack, for fast lookup without scanning. - Optional bitmaps — precomputed reachability info GitHub / some servers use to accelerate transfer-pack generation; see systems/github.
- Optional delta islands — server-side constraints on which objects may delta against which, preserving fast pack construction for specific ref patterns.
How objects become deltas¶
Git's git repack / git gc / git push --receive-pack path
considers candidate objects and decides whether each can be
represented as a small delta against an already-stored object.
Two knobs dominate compression ratio:
--window=N— how many nearby candidate objects (in Git's sort order) to consider as possible delta bases for a given object. Larger window → better compression, more CPU.--depth=N— maximum delta chain length. Larger depth → better compression, but slower object materialization (each chain step is a patch apply).
Git's default pairing step sorts candidates by a heuristic based on the last 16 characters of the file path, then feeds that window to delta-compression. In most codebases this works well because similar names usually mean similar content. When the distinguishing part of the path falls outside the 16-char window, Git pairs unrelated files and produces pathologically large deltas — see concepts/git-delta-compression and the Dropbox i18n canonical instance.
Why local repack doesn't push¶
Pack files themselves can be replicated via push, but a server that runs Git-on-top-of-SaaS (e.g. GitHub) rebuilds the transfer pack dynamically per client request based on what the client is missing. Any local repack optimisation is re-derived on the server from its own packing configuration. For a permanent pack-size improvement on such a platform, the repack has to run on the server using supported flags and parameters — see patterns/server-side-git-repack.
Seen in¶
- sources/2026-03-25-dropbox-reducing-monorepo-size-developer-velocity
— Dropbox repacks its 87 GB monorepo with tuned
--window=250 --depth=250on GitHub's servers; size drops to 20 GB. - sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git — Artifacts stores pack-file snapshots in R2 for durability while keeping hot Git objects in per-repo [[systems/ cloudflare-durable-objects|DO]] SQLite (chunked across 2 MB rows). Unusual storage-for-CPU/memory trade: raw deltas + base hashes are persisted alongside the resolved object — "On fetch, if the requesting client already has the base object, Zig emits the delta instead of the full object, which saves bandwidth and memory." Contrasts with vanilla Git which stores only the delta form in pack files and reconstructs the resolved object on demand — Cloudflare trades storage footprint for lower peak-memory + bandwidth within the ~128 MB DO envelope. Pack parsing is one of the primitives reimplemented from scratch in the ~100 KB pure-Zig Wasm Git server.
Related¶
- systems/git — the VCS pack files belong to.
- concepts/git-delta-compression — how delta candidates are chosen and where it goes wrong.
- patterns/server-side-git-repack — the pattern that makes aggressive repack production-safe on a managed platform.
- concepts/monorepo — typical consumer where pack-file compression ratio matters most.
- systems/cloudflare-artifacts — alternative pack-file consumer (durable snapshots in R2 + delta-alongside-resolved in DO SQLite).
- concepts/wasm-git-server — reimplements pack parsing from scratch in Zig→Wasm.
- patterns/do-backed-git-server — the substrate pattern where this alternative pack-storage strategy applies.