Skip to content

PATTERN Cited by 1 source

Server-side Git repack

Server-side Git repack is the pattern of permanently fixing a Git repo's pack-file size on a managed Git host (e.g. GitHub) by running an aggressive, host-supported git repack on the host's servers, not on a local mirror.

Problem

A Git repo's on-disk and on-wire size is determined by how its pack files are constructed. On a managed SaaS, the host rebuilds the transfer pack dynamically per request (see systems/github) from its own packing configuration — so local git repack improvements pushed back up are immediately re-derived from the server's own rules and do not persist.

Consequence: if the default pack heuristic has gone pathological on your repo (canonical case: Git's 16-char path-pairing heuristic mismatching your directory layout), you cannot fix it from the client side. The repack has to run where the authoritative pack files live.

Solution shape

  1. Reproduce locally to confirm the root cause. git clone --mirror the repo, run aggressive repack variants (git repack -adf --depth=250 --window=250 with or without experimental flags like --path-walk) to show that a repack can produce the target size. Dropbox's local --path-walk experiment hit 84 GB → 20 GB in ~9h, proving the hypothesis.

  2. Work with the platform provider to pick a server-compatible repack shape. Aggressive flags that bypass the default pairing heuristic (e.g. Git's --path-walk) may be incompatible with server-side optimizations the platform uses to keep clone/fetch fast — GitHub specifically relies on bitmaps and delta islands that don't compose with --path-walk. The platform-supported alternative is usually to keep the default pairing heuristic but raise --window and --depth (Dropbox + GitHub landed on --window=250 --depth=250).

  3. Mirror-first validation — see patterns/mirror-first-repack-validation. Have the platform repack a test mirror; measure fetch-latency distribution, push success rate, and API latency; accept any acceptable-tail movement as a tradeoff against the compression win.

  4. Gradual production rollout. Platform performs the repack one replica per day over a week, read-write replicas first, buffer time at the end for rollback. A repack changes the physical layout of billions of objects for every interaction with the repo, so the ops discipline matches any other massive in-place infrastructure change.

  5. Structural follow-up. Fix the root cause (directory layout, file naming) so the default heuristic works correctly going forward; stand up patterns/repo-health-monitoring so the next regression is visible early.

Why it matters

Generalises across any structural VCS / package-format issue where the authoritative store lives on a managed SaaS:

  • The diagnosis can be done locally against a mirror.
  • The fix has to run where the authoritative artefact is constructed.
  • The rollout has to treat the artefact as production infrastructure (billions of clients/reads in flight).

Canonical instance: Dropbox's 87 GB → 20 GB GHEC repack: confirmed root-cause with a local --path-walk experiment, couldn't ship that flag, ran tuned --window=250 --depth=250 on GitHub's servers via GitHub Support, rolled replica-by-replica over a week, hit 77% size reduction + clone time >1h → <15 min, no fetch/push/API-latency regressions.

Operational details (2026-03-25 Dropbox instance)

  • Production repack on GitHub's servers: tuned --window=250 --depth=250 (default heuristic retained for server-side compatibility).
  • Local experimental measurement for root-cause confirmation: git repack -adf --depth=250 --window=250 on a git clone --mirror copy of the repo.
  • Rollout cadence: one replica per day, ~1 week total.
  • Test-mirror repack wallclock (local Dropbox measurement): ~9 hours.

Caveats

  • Aggressive --window / --depth trade repack time for compression ratio — plan for multi-hour server-side work, not a minute-scale operation.
  • Server-side bitmaps / delta islands (GitHub) foreclose some repack flags; your fix space is bounded by what the platform supports.
  • A repack does not alter the content of your code — just its physical pack layout — so semantic correctness is not at risk, but every client's clone/fetch/push path touches the new layout.
  • Structural follow-up (directory layout reshape) is what prevents the issue from recurring; without it, the next round of translation updates (or whatever the domain-specific analogue is) starts re-inflating the packs.

Seen in

Last updated · 200 distilled / 1,178 read