Skip to content

CONCEPT Cited by 6 sources

Monorepo

Monorepo = a single shared version-control repository holding many services, libraries, tools, and other artefacts that would otherwise live in separate repos. Opposed to polyrepo (many repositories, one per service/library).

Why pick one

  • Atomic cross-service change — one commit can edit a library and every consumer, so invariants stay enforced without version-bump-and-wait dances.
  • Shared tooling + code — one build config, one code-search surface, one dependency tree, one set of lint/format/test configurations.
  • Consolidated ownership signals — easier to see who touches what, reason about blast radius, coordinate migrations, enforce policy (see also patterns/bisect-driven-regression-hunt, patterns/centralized-forward-declarations).

What you pay

Operational discipline

Monorepos are production infrastructure, not passive storage. The 2026-03-25 Dropbox post makes this explicit: "Repositories can feel like passive storage, something that simply grows over time. At scale, they are not passive. They are critical infrastructure that directly affects developer velocity and CI reliability." Concrete operational moves:

  • Track repo size, growth rate, fresh clone time, per-subtree storage distribution continuously on an internal dashboard — patterns/repo-health-monitoring.
  • Treat structural repo rewrites (e.g. server-side repack) like any production-infra change: test on a mirror first (patterns/mirror-first-repack-validation), roll out gradually with a rollback path.
  • Align directory layout with the tooling's internal assumptions — in the Dropbox i18n case, reshape paths so similar-content files share the tail-16-char window the default Git heuristic uses.

Canonical wiki instances

  • systems/dropbox-server-monorepo — 87 GB Git monorepo on GHEC, 77% size reduction via server-side repack after identifying the Git-heuristic / i18n-layout structural mismatch.
  • Meta's internal monorepo — the extreme upper-bound: "tens of millions of files, tens of millions of commits, and tens of millions of branches", explicitly positioned in the 2022-11-15 Sapling announcement as a regime "public source control systems were not, and still are not, capable of handling." Meta chose to build a dedicated VCS ( Sapling, 10-year development, Mercurial-lineage) rather than fit its monorepo onto a managed Git host — the scale where Dropbox-style server-side Git repack stops being a viable answer. Key scaling primitives: Segmented Changelog for O(log n) history queries on megabyte- scale graph-shape indexes + lazy history download + virtual file system for working-copy scale + organization-owned sparse profiles for thousands-of-engineers on shifting subsets.
  • Figma monorepo — context for the Figma device-trust commit signing workflow (every commit merging to a release branch must be S/MIME-signed with a device-trust cert).
  • Canva monorepo — context for Canva's CI build speedups, which leans on hermetic + content-addressed + remote-executed Bazel builds to keep a large shared repo's CI fast.

Task-graph construction as a first-order monorepo tax

Monorepos using task-runner-with-DAG-construction build systems — Turborepo, Nx, Bazel with query-time build graph load, Rush, Moonrepo, Lerna — pay a per-invocation construction cost that scales with repo size, paid before the first task runs. This cost is invisible to the build itself (tasks still run as fast as they would in a polyrepo) but extracts a tax on every developer turbo run/nx run/rush build invocation.

Canonical disclosure: Vercel's 1,000-package internal monorepo paid 8.1 s to construct Turborepo's task graph before the first task ran in v2.8.0 ( 2026-04-21 Turborepo post). The 2026-04-21 performance campaign drove this to 716 ms (91 % faster) via parallelisation (git index, glob walk, lockfile parsing, package.json loading in parallel), allocation elimination (stack-allocated git OIDs, shared HTTP client, reference-based hashing), and syscall reduction (gix-index replaces libgit2 replaces per-package git subprocess calls).

The v2.8.0 → v2.9.0 regression table across three repo sizes is instructive about how task-graph-construction scales:

Repo size v2.8.0 v2.9.0
~1,000 packages 8.1 s 716 ms
132 packages 1.9 s 361 ms
6 packages 676 ms 132 ms

The tax grows faster than linearly with package count in v2.8.0 (a 7.5× package-count increase from 132 to 1,000 produces a 4.3× time increase, but from 6 to 1,000 the increase is 12× for a 167× package count — suggesting real-world monorepo structure has non-linear cost from lockfile size, cross-package dependency fan-out, and shared-tooling overhead). v2.9.0's scaling is still slightly super-linear but much closer to the ideal.

Implication for monorepo operational discipline: task-graph-construction time should be on the same dashboard as clone-time, fresh-build-time, and fresh-CI-time as a first-class repo-health metric. See patterns/repo-health-monitoring.

Monorepo as tooling substrate

Beyond VCS operational cost, a monorepo's structural affordance — structured ownership metadata, a unified runtime code graph, time-ordered change history across all services — enables tooling that multi-repo worlds would need to build a cross-repo coordination layer for first.

Canonical example: Meta's web-monorepo AI-assisted RCA system (sources/2024-08-23-meta-leveraging-ai-for-efficient-incident-response, 2024-08-23) uses code + directory ownership and runtime code-graph exploration of impacted systems as its stage-1 heuristic retriever — narrowing "thousands of changes to a few hundred" before an LLM ranker operates on the narrowed set. The retrieval rules are tractable precisely because the monorepo makes ownership and the runtime graph queryable at the repo level; a multi-repo equivalent would require a cross-repo ownership service + a cross-repo dependency graph that most organisations don't maintain. The monorepo surfaces as infrastructure for RCA tooling, not merely storage.

The second canonical example is Meta's Glean + Glass (sources/2025-01-01-meta-indexing-code-at-scale-with-glean, 2024-12-19): a centralized ahead-of-time code-indexing service against the web monorepo. The post names the monorepo explicitly as the architectural driver — "code navigation at scale, on large monorepos containing millions of lines in diverse programming languages, is a challenging problem" — and the centralized-service design choice is the direct consequence of IDE-local indexing breaking down past a certain repo size (especially for C++). Glean's downstream uses — code navigation, auto-generated docs, dead-code detection, API-migration tracking, test selection, diff-sketch-driven code review — are all examples of monorepo as tooling substrate: each would need an inter-repo coordination layer in a polyrepo world.

Monorepo's missing piece: long-lived branches against upstream OSS

A monorepo optimized for trunk-based development can lack the branching surface needed to track many long-lived patches against an upstream OSS project — one branch per internal patch per upstream release. Dropping this into the monorepo either overloads the monorepo's branch infrastructure or forces stored-patch-file schemes that lose Git history and parallelism.

Canonical example: Meta's WebRTC retrofit (sources/2026-04-09-meta-escaping-the-fork-webrtc-modernization, 2026-04-09). Meta explicitly calls out the monorepo constraint: "Since we use a monorepo without widespread support for branches, we sought a way to track patches over time that would be continuously rebased on top of upstream." The resolution is an external Git repo based directly on libwebrtc's own Git repo, organized with tag-anchored feature branches (base/7499, debug-tools/7499, r7559). Four named benefits: parallelizable rebases, preserved Git history, LLM-friendly for future auto-conflict-resolution, and submit-ready upstream. The monorepo's branch-limited shape is the structural property that forces Solution 2 to sit outside the monorepo.

Last updated · 542 distilled / 1,571 read