Skip to content

CONCEPT Cited by 2 sources

Git pack file

A Git pack file is Git's on-disk compressed bundle of objects. Rather than keep every blob / tree / commit as a loose file under .git/objects/, Git periodically packs many objects together into .pack (data) + .idx (index) pairs under .git/objects/pack/. Pack files are where Git's storage efficiency actually lives.

What a pack file contains

  • Object payloads, each stored either as a full compressed object or as a delta against a nearby base object (the base may itself be a delta — up to --depth chained deltas deep).
  • Index (.idx) mapping object SHA → byte offset in the pack, for fast lookup without scanning.
  • Optional bitmaps — precomputed reachability info GitHub / some servers use to accelerate transfer-pack generation; see systems/github.
  • Optional delta islands — server-side constraints on which objects may delta against which, preserving fast pack construction for specific ref patterns.

How objects become deltas

Git's git repack / git gc / git push --receive-pack path considers candidate objects and decides whether each can be represented as a small delta against an already-stored object.

Two knobs dominate compression ratio:

  • --window=N — how many nearby candidate objects (in Git's sort order) to consider as possible delta bases for a given object. Larger window → better compression, more CPU.
  • --depth=N — maximum delta chain length. Larger depth → better compression, but slower object materialization (each chain step is a patch apply).

Git's default pairing step sorts candidates by a heuristic based on the last 16 characters of the file path, then feeds that window to delta-compression. In most codebases this works well because similar names usually mean similar content. When the distinguishing part of the path falls outside the 16-char window, Git pairs unrelated files and produces pathologically large deltas — see concepts/git-delta-compression and the Dropbox i18n canonical instance.

Why local repack doesn't push

Pack files themselves can be replicated via push, but a server that runs Git-on-top-of-SaaS (e.g. GitHub) rebuilds the transfer pack dynamically per client request based on what the client is missing. Any local repack optimisation is re-derived on the server from its own packing configuration. For a permanent pack-size improvement on such a platform, the repack has to run on the server using supported flags and parameters — see patterns/server-side-git-repack.

Seen in

  • sources/2026-03-25-dropbox-reducing-monorepo-size-developer-velocity — Dropbox repacks its 87 GB monorepo with tuned --window=250 --depth=250 on GitHub's servers; size drops to 20 GB.
  • sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-gitArtifacts stores pack-file snapshots in R2 for durability while keeping hot Git objects in per-repo [[systems/ cloudflare-durable-objects|DO]] SQLite (chunked across 2 MB rows). Unusual storage-for-CPU/memory trade: raw deltas + base hashes are persisted alongside the resolved object — "On fetch, if the requesting client already has the base object, Zig emits the delta instead of the full object, which saves bandwidth and memory." Contrasts with vanilla Git which stores only the delta form in pack files and reconstructs the resolved object on demand — Cloudflare trades storage footprint for lower peak-memory + bandwidth within the ~128 MB DO envelope. Pack parsing is one of the primitives reimplemented from scratch in the ~100 KB pure-Zig Wasm Git server.
Last updated · 200 distilled / 1,178 read