CONCEPT

Fork-upstream sync¶

Definition¶

Fork-upstream sync is the operational problem of keeping a privately-modified fork of an open-source project continuously aligned with the upstream project while preserving local modifications. The problem is not one-shot — OSS keeps merging PRs, cutting release branches, and shipping point releases; the fork keeps adding private changes; and both have to stay mergeable over years of parallel activity.

It is distinct from vendoring (freeze at a known good version and upgrade on demand) and from contributing back (upstream the fix pattern from patterns/upstream-the-fix). Sync is the long-lived state between those two where both sides ship on their own cadence.

Why it gets expensive¶

For a small diff, periodic cherry-picks work fine. The cost grows along three axes:

Diff size — the larger the private divergence, the more private commits have to be replayed each cycle and the more surface area for conflicts.
Number of upstream branches tracked — if the fork follows both OSS main and each OSS release branch, every sync has to happen N times, and the same conflict tends to appear on every branch.
OSS velocity — fast-moving upstreams make the batch-sync backlog grow faster than humans can clear it.

Branch-pair mirror topology¶

A clean way to model fork-sync is as a set of branch-pair mirrors: for each upstream branch the fork cares about, there's a private counterpart that holds OSS's content plus the private diff. PlanetScale's Vitess fork maps it like this:

OSS branch	Private equivalent
`main`	`upstream`
`release-x.0`	`latest-x.0`

When OSS cuts a new release-x.0 from main, a matching private latest-x.0 is cut from upstream. Every OSS-side PR flows to its private counterpart; every private-side change stays on the private counterpart. (Source: )

The three-stage evolution pattern¶

Teams running long-lived forks tend to progress through stages as scale breaks each prior design:

Manual cherry-picks — works for small diffs and few branches.
Batch sync tool — e.g. git-replay; replays a sequence of private commits with conflict-resolution memoisation. Still human-triggered, still batch.
Continuous bot — e.g. Vitess cherry-pick bot; auto-syncs on cron as soon as upstream merges a PR, opens draft PRs on conflicts, uses labels to gate backports, and runs out-of-band reconciliation.

Each transition is driven by the prior mechanism's hit rate dropping below the team's ability to keep up.

Benefits when it works¶

The fork stays mergeable — no multi-day "catch-up" before each release that piles up deferred conflicts.
The private diff doesn't need explicit maintenance — "we no longer needed to explicitly maintain the private diffset" — the OSS flow arrives continuously and private changes continue to land against upstream naturally.
New release branches are cheap — cut a private latest-x.0 and the bot handles the backports.

Failure modes¶

Conflict storm when the diff grows enough that most cherry- picks collide; stalls the pipeline on human resolution.
Silent omission when backport is label-triggered and a label is missed; requires an out-of-band reconciliation sweep to catch.
State-store dependency when the bot uses an external DB for incremental-pull state; DB outage == sync outage.
Conflict-resolution retention — memoised resolutions in the systems/git-replay era vs fresh draft-PRs in the bot era trade off automation debt against conflict-resolution latency.

Seen in¶

— PlanetScale's Vitess fork is the canonical worked example: manual → git-replay → continuous bot, with the branch-pair mirror topology formalised and reconciliation scans backing the fast path.