Skip to content

PATTERN Cited by 1 source

Mirror-first repack validation

Mirror-first repack validation is the pre-production step for a structural Git-repo rewrite (like a server-side repack) on a managed platform: run the exact target operation on a mirror of the repo first, measure production-relevant metrics, accept/reject the tradeoff, then schedule the live rollout.

Why

A Git repack changes how billions of objects are physically organised on disk — the code content is unchanged, but every clone, fetch, and push interacts with the new layout. On a managed SaaS like GitHub, the platform relies on server-side structures (bitmaps, delta islands) that the repack flags interact with. Running the repack directly in production without validation risks:

  • Fetch/clone performance regressions at the tail (new layout might happen to penalise a specific access pattern).
  • Push regressions (receive-pack rebuilding deltas differently against the new pack).
  • API-latency regressions on the platform's code APIs.
  • Edge cases where specific repos / refs interact badly with the chosen --window / --depth values.

Mirror-first validation turns all of these into data before any engineer or CI job is affected.

Solution shape

  1. Create a mirror. Clone the target repo as a full --mirror into a parallel location on the platform (not in production serving).
  2. Run the target repack on the mirror — with the exact flags and parameters planned for production (Dropbox + GitHub: --window=250 --depth=250).
  3. Measure production-shaped metrics against the repacked mirror:
  4. Fetch duration distribution (p50/p90/p99, tail movement).
  5. Push success rate.
  6. Platform API latency.
  7. Clone time from a representative client.
  8. Decide on the tradeoff. Compression ratio is known (Dropbox's mirror: 78 GB → 18 GB, ~4× reduction); accept "minor movement at the tail of fetch latency" if that tradeoff buys a 4× size win, reject otherwise.
  9. Schedule production rollout gradually (platform-side one-replica-per-day is GitHub's standard cadence; see patterns/server-side-git-repack).

What it looks like in practice (Dropbox 2026-03-25)

  • Mirror: full git clone --mirror of the server monorepo.
  • Repack run by GitHub on the mirror with the chosen --window=250 --depth=250 configuration.
  • Result: 78 GB → 18 GB; minor movement at the fetch-latency tail (explicitly deemed acceptable for the 4× size reduction); push success and API latency held.
  • Production rollout followed: one replica per day, ~1 week, with read-write replicas first and rollback buffer at the end.
  • Final result: 87 GB → 20 GB, clone time >1h → <15 min, no regressions.

Relation to other validation patterns

Caveats

  • Cost: running the repack twice (mirror + production) doubles compute time; the mirror-side cost is cheap insurance, not free.
  • Mirror ≠ production traffic. Fetch-latency measurements against a mirror capture the geometry of the new pack files but not the full live traffic mix (concurrent pushes / receive-pack contention / etc.). That's why the subsequent production rollout still needs replica-by-replica ramping.
  • Tradeoff decisions can't be automated — Dropbox / GitHub explicitly judged the tail movement as acceptable. There is no universal "good enough" threshold; it's a call against the compression win and the alternative (hit a 100 GB repo limit with no fix).

Seen in

Last updated · 200 distilled / 1,178 read