Skip to content

SYSTEM Cited by 1 source

Vitess cherry-pick bot

What it is

PlanetScale's internal bot that continuously mirrors OSS Vitess PRs into their private fork and manages per-release-branch backports. It replaced a prior weekly-whole-diff cherry-pick workflow and the git-replay tool once the private diffset and number of release branches made both approaches unsustainable.

The bot's central design choice: GitHub Actions cron + external database state, not a dedicated server. "we opted for a solution that runs periodically on a cron schedule using GitHub Actions, with its state stored in a PlanetScale database instance." (Source: sources/2026-04-21-planetscale-automating-cherry-picks-between-oss-and-private-forks)

Branch topology it maintains

OSS branch Private equivalent
main upstream
release-22.0 latest-22.0
release-x.0 latest-x.0

When a new OSS release branch is cut from main, a matching private latest-x.0 branch is created from upstream.

Cherry-pick flow (main → upstream)

Hourly cron. Steps:

  1. Fetch recently closed PRs from OSS vitess/vitess via the go-github library.
  2. Stop the fetch when encountering a PR that predates any PR already in the database — incremental pull by timestamp.
  3. Filter out PRs closed without being merged.
  4. Insert the remaining PRs into the database for processing.
  5. Check out vitess-private and attempt git cherry-pick against upstream.
  6. Create a PR against upstream for each cherry-pick, regardless of conflict status.
  7. On conflict: PR is draft with labels do not merge + Conflict, a comment posts the git status output, and the original PR author is tagged (or the merger, if the author is a non-PlanetScale contributor). Inherits title and description from the original PR.
  8. On success: PR inherits title, description, assignee; waits for normal review.
  9. Mark the PR as cherry-picked in the database.

See patterns/automated-upstream-cherry-pick-bot for the general shape, patterns/draft-pr-for-conflicts for the conflict handling.

Backport flow (upstream → latest-x.0)

Label-triggered, not automatic. Flow:

  1. A PR in vitess-private gets the label Backport to: latest-x.0.
  2. The bot sees the label on its next run and initiates a backport workflow analogous to cherry-pick.
  3. Conflict handling is the same: draft PR + labels + author tag.

See patterns/label-triggered-backport. The label gate puts the per-PR backport decision on the author rather than blanket-backporting everything — matches the real boundary that not every change belongs on every release branch.

Weekly reconciliation

Two out-of-band audit checks run weekly and post a summary to a dedicated GitHub issue. See patterns/weekly-reconciliation-check and concepts/weekly-integrity-reconciliation.

Upstream-in-sync-with-OSS check flags:

  • Open cherry-pick PRs against upstream (never merged).
  • OSS main PRs that were not cherry-picked into upstream.
  • PRs merged directly into a latest-x.0 branch instead of being backported from upstream.

Latest-branches-consistent check flags:

  • Open backport PRs against latest-x.0 branches.
  • PRs merged into latest-x.0 that are not backports.
  • PRs backported to latest-x.0 but not to higher-numbered latest branches (where applicable).

These reconciliation checks exist because the hourly fast path has failure modes the bot itself can't see in real time: a PR missing a label, a direct-merge slipping past review, a cherry-pick PR going stale and open forever. The weekly sweep surfaces all of them.

State model

Primary state: a PlanetScale database instance with a table of recently-closed OSS PRs and their cherry-pick status. The database drives three things:

  1. The incremental-pull stopping criterion ("stops once it encounters a PR that predates any PR already present in the database").
  2. The PR → cherry-pick PR mapping used by reconciliation.
  3. The bot's ability to be idempotent across cron runs — re-running doesn't re-cherry-pick already-processed PRs.

The stateful design was an explicit choice over a stateless PR-metadata-only approach: "Should it be stateless, storing all information in PRs, or stateful with a dedicated data store? After extensive deliberation, we opted for [stateful]." See patterns/stateful-github-actions-cron.

Authentication

A GitHub token is used to:

  • Check out the vitess-private repository.
  • Create PRs against vitess-private.
  • Read closed PRs from vitess/vitess via go-github.

Outcome

"Over a year and six months later, the results have been remarkable. The bot has saved countless hours of engineering time, allowing our team to focus on building innovative features for our users rather than manually cherry-picking PRs!" Retrospective post, not a launch.

Caveats

  • Bootstrapping: seeding the DB with already-processed PR history is non-trivial on first deploy of this kind of system.
  • Label discipline: backport completeness depends on authors applying the right labels; the weekly reconciliation is the only backstop.
  • State-store dependency: bot is unavailable when the DB is — not described as a reliability constraint in the post but a real operational dependency.
  • No conflict-memoisation story in the new bot. The predecessor systems/git-replay memoised conflict resolutions; the post does not describe whether the new bot reuses that mechanism.
  • Scale numbers not disclosed. PR throughput, conflict rate, database size, and API-call budget aren't published.

Seen in

Last updated · 319 distilled / 1,201 read