Skip to content

SYSTEM Cited by 1 source

Dropbox server monorepo

Dropbox server monorepo is the single Git repository that holds the majority of Dropbox's backend services and libraries. In Dropbox's own framing: "almost every product change flows through a single place: our server monorepo … the repository sits at the center of nearly everything we build."

The canonical day-in-the-life for this repo: pull latest, build, test, review, merge, ship. CI jobs repeatedly clone fresh. Both flows — engineer setup and CI — incur the repo's clone cost.

Scale (2026-03-25 post)

  • Pre-fix size: ~87 GB.
  • Pre-fix initial clone time: >1 hour.
  • Pre-fix growth rate: 20–60 MB/day typical, spikes >150 MB/day.
  • GHEC hard repo-size limit: 100 GB (post frames Dropbox was months from hitting it).
  • Post-fix size: ~20 GB (77% reduction).
  • Post-fix initial clone time: <15 min in many cases.

Language / content mix

Not fully disclosed. What is named in the 2026-03-25 post:

  • Backend services + libraries are the bulk — the monorepo architecture is framed as easing cross-service change.
  • AI features ranking systems / retrieval pipelines / evaluation logic / UI surfaces — small changes crossing multiple backend services; prime workload for the Dash engineering loop.
  • Internationalization (i18n) files under i18n/metaserver/[language]/LC_MESSAGES/[filename].po — not volumetrically large, but the pathological case for Git's delta-compression heuristic (language code falls before the 16-char trailing-path window → Git pairs .po files across languages instead of within one, producing oversized pack contributions from routine translation updates).

2026 repo-size incident / fix

Root cause identified as the 16-char trailing-path delta heuristic interacting badly with the i18n layout (concepts/git-delta-compression), not committed payload.

Fix path:

  1. Local reproduction: git repack -adf --depth=250 --window=250 --path-walk on a git clone --mirror copy → 84 GB → 20 GB in ~9h; confirms the hypothesis.
  2. GitHub Support reports --path-walk incompatible with server-side bitmaps / delta islands; local fixes also don't survive GitHub's dynamic server-side transfer-pack construction (systems/github).
  3. GitHub recommends server-side repack with tuned --window=250 --depth=250 (no --path-walk) — patterns/server-side-git-repack.
  4. Mirror-first validation: 78 GB → 18 GB on a test mirror; fetch latency distribution / push success / API latency held within tradeoff tolerance (patterns/mirror-first-repack-validation).
  5. Production: gradual rollout by GitHub over a week, one replica per day, read-write replicas first, rollback buffer at the end.
  6. Result: 87 GB → 20 GB (77% reduction), clone time >1h → <15 min, no regressions.

Two follow-ups:

  • i18n layout restructured so delta pairing falls inside the 16-char window (prevent regression at the source).
  • Recurring repo-stats dashboard tracking overall repo size, growth rate, fresh clone time, and per-subtree storage distribution — patterns/repo-health-monitoring canonical instance.

Why this matters

This is a canonical wiki instance of structural, not behavioural, repo growth: no rogue binaries, no leaked deps, no generated files — the interaction between directory layout and Git's internal heuristic was the entire cause. The remediation sits at the boundary between Dropbox's code and GitHub's managed platform and required collaborative debugging + a mutually-supported fix shape. Both are load-bearing lessons for any large Git monorepo on a managed SaaS.

Seen in

Last updated · 200 distilled / 1,178 read