Skip to content

CONCEPT Cited by 1 source

Fork-upstream sync

Definition

Fork-upstream sync is the operational problem of keeping a privately-modified fork of an open-source project continuously aligned with the upstream project while preserving local modifications. The problem is not one-shot — OSS keeps merging PRs, cutting release branches, and shipping point releases; the fork keeps adding private changes; and both have to stay mergeable over years of parallel activity.

It is distinct from vendoring (freeze at a known good version and upgrade on demand) and from contributing back (upstream the fix pattern from patterns/upstream-the-fix). Sync is the long-lived state between those two where both sides ship on their own cadence.

Why it gets expensive

For a small diff, periodic cherry-picks work fine. The cost grows along three axes:

  1. Diff size — the larger the private divergence, the more private commits have to be replayed each cycle and the more surface area for conflicts.
  2. Number of upstream branches tracked — if the fork follows both OSS main and each OSS release branch, every sync has to happen N times, and the same conflict tends to appear on every branch.
  3. OSS velocity — fast-moving upstreams make the batch-sync backlog grow faster than humans can clear it.

Branch-pair mirror topology

A clean way to model fork-sync is as a set of branch-pair mirrors: for each upstream branch the fork cares about, there's a private counterpart that holds OSS's content plus the private diff. PlanetScale's Vitess fork maps it like this:

OSS branch Private equivalent
main upstream
release-x.0 latest-x.0

When OSS cuts a new release-x.0 from main, a matching private latest-x.0 is cut from upstream. Every OSS-side PR flows to its private counterpart; every private-side change stays on the private counterpart. (Source: sources/2026-04-21-planetscale-automating-cherry-picks-between-oss-and-private-forks)

The three-stage evolution pattern

Teams running long-lived forks tend to progress through stages as scale breaks each prior design:

  1. Manual cherry-picks — works for small diffs and few branches.
  2. Batch sync tool — e.g. git-replay; replays a sequence of private commits with conflict-resolution memoisation. Still human-triggered, still batch.
  3. Continuous bot — e.g. Vitess cherry-pick bot; auto-syncs on cron as soon as upstream merges a PR, opens draft PRs on conflicts, uses labels to gate backports, and runs out-of-band reconciliation.

Each transition is driven by the prior mechanism's hit rate dropping below the team's ability to keep up.

Benefits when it works

  • The fork stays mergeable — no multi-day "catch-up" before each release that piles up deferred conflicts.
  • The private diff doesn't need explicit maintenance"we no longer needed to explicitly maintain the private diffset" — the OSS flow arrives continuously and private changes continue to land against upstream naturally.
  • New release branches are cheap — cut a private latest-x.0 and the bot handles the backports.

Failure modes

  • Conflict storm when the diff grows enough that most cherry- picks collide; stalls the pipeline on human resolution.
  • Silent omission when backport is label-triggered and a label is missed; requires an out-of-band reconciliation sweep to catch.
  • State-store dependency when the bot uses an external DB for incremental-pull state; DB outage == sync outage.
  • Conflict-resolution retention — memoised resolutions in the systems/git-replay era vs fresh draft-PRs in the bot era trade off automation debt against conflict-resolution latency.

Seen in

Last updated · 319 distilled / 1,201 read