Skip to content

PLANETSCALE 2025-01-14 Tier 3

Read original ↗

PlanetScale — Automating cherry-picks between OSS and private forks

Summary

PlanetScale maintains a private fork of the open-source Vitess project with internal modifications on top of upstream. As their private diffset grew and they began tracking stable OSS release branches rather than main, manual cherry-picking became unsustainable. Manan Gupta describes a three-step evolution: (1) a weekly GitHub Action that cherry-picked all private commits onto the latest main; (2) git-replay, a tool that memoises conflict-resolution outcomes and replays them on new cherry-pick passes; and finally (3) the Vitess cherry-pick bot — a stateful GitHub-Actions-scheduled bot with a PlanetScale database for bookkeeping — that continuously mirrors OSS main PRs into private upstream, supports label-triggered backports to latest-x.0 release branches, creates draft PRs on conflict with tagged resolution owners, and runs weekly reconciliation checks to catch PRs that went missing between branches.

Key takeaways

  1. Fork-sync topology mapped as branch-pair mirrors. OSS main ↔ private upstream; OSS release-x.0 ↔ private latest-x.0. Whenever a new OSS release branch is cut from main, a matching private latest branch is created from upstream. "We also set up private equivalents of Vitess release branches. These private branches would also include our private diff." (concepts/fork-upstream-sync)

  2. Evolution had three distinct stages, each driven by scale breaking the previous design. Weekly whole-diff cherry-pick → git-replay (conflict-resolution memoisation) → continuous PR-level cherry-pick bot with DB state. The rewrite at each stage was triggered by a specific failure mode: growing diff size made manual conflict resolution cost-dominant; extending to multiple release branches made conflicts repeat identically across branches; per-release mass cherry-picks decoupled OSS flow from private flow too much. "As our private diffset continued to grow, we realized the need for a more continuous and efficient process." (systems/git-replay, systems/vitess-cherry-pick-bot)

  3. The bot is a GitHub Action on a cron schedule + a PlanetScale database. Architecture decision: "we opted for a solution that runs periodically on a cron schedule using GitHub Actions, with its state stored in a PlanetScale database instance. The bot operates on an hourly cron schedule in GitHub Actions and performs two core tasks: Cherry-Picking and Backporting." The DB stores recently closed PRs and their cherry-pick status; the cron job reconciles the DB with GitHub via the go-github library. Stateful-cron beats stateless "inspect every PR ever" on both time and API rate budget. (patterns/stateful-github-actions-cron)

  4. Conflicts don't block — the bot opens a draft PR anyway. "The workflow creates a PR even if conflicts arise during cherry-picking, ensuring no PRs are missed." Conflict PRs are draft with do not merge + Conflict labels, the original PR's title and body are carried over, the original author (or merger, if author is a non-PlanetScale contributor) is assigned, and the bot comments with the git status output listing conflicting files and tagging the author. This turns conflicts from a pipeline-blocker into an async task queue. (patterns/draft-pr-for-conflicts)

  5. Backport is label-triggered, not automatic. "Backports are not automatically triggered. Instead, they rely on labels applied to PRs in the vitess-private repository. Labels like Backport to: latest-x.0 signal the bot to initiate the backport." Cherry-pick (main → upstream) is full-auto; backport (upstream → release branch) is opt-in per PR. This matches the real decision boundary: everything OSS-merged should reach private mainline, but not every change belongs on every release branch. (patterns/label-triggered-backport)

  6. Weekly reconciliation catches what the primary flow missed. Two dedicated checks run every week: upstream-in-sync-with-OSS flags open cherry-pick PRs against upstream, OSS main PRs not yet cherry-picked, and PRs merged directly into latest that should have been backported from upstream; latest-branches-consistent flags open backport PRs, direct merges into latest, and PRs backported to latest-x.0 but not to higher-numbered latest branches. The results are posted as a comment on a dedicated GitHub issue — humans triage from there. "The bot posts a summary of these checks to a dedicated GitHub issue every week, providing visibility into any issues that may require manual inspection or action." (concepts/weekly-integrity-reconciliation, patterns/weekly-reconciliation-check)

  7. Outcome: "a year and six months later" the bot has been a decisive success. "Over a year and six months later, the results have been remarkable. The bot has saved countless hours of engineering time, allowing our team to focus on building innovative features for our users rather than manually cherry-picking PRs!" The post is a retrospective, not a launch announcement — the production track record is real.

Systems

  • systems/vitess-cherry-pick-bot — new page. The scheduled GitHub-Actions-hosted bot with PlanetScale DB state that drives the full workflow.
  • systems/git-replay — new page. The predecessor tool that memoised conflict-resolution outcomes and replayed them on subsequent cherry-picks across release branches.
  • systems/vitess — extended. Vitess is the project being forked; this source adds the private-fork management story to its page.
  • systems/git — extended. The cherry-pick / conflict-resolution operations are baseline Git primitives whose coordination at fork-sync scale drove the tool chain.
  • systems/github-actions — extended. Used as a scheduled-cron compute substrate with external state.
  • systems/github — referenced as the PR + label + issue surface.

Concepts

Patterns

Operational numbers

  • Bot cron interval: hourly (GitHub Actions schedule).
  • Reconciliation cron interval: weekly.
  • Branch topology:
    • OSS main ↔ private upstream
    • OSS release-22.0 ↔ private latest-22.0
    • OSS release-x.0 ↔ private latest-x.0
  • PR discovery stopping criterion: bot fetches closed PRs until encountering a PR that predates any PR already in the database — monotonic timestamp-based incremental pull.
  • Conflict PR shape: draft + do not merge label + Conflict label + git status output posted as comment + original author / merger tagged.
  • Track record: "a year and six months" in production at time of writing (2025-01-14, so ~mid-2023 rollout).

Caveats

  • Stateful design is non-trivial to bootstrap. Spinning this up from scratch on a private fork requires seeding the DB with already-cherry-picked PRs; otherwise the first run either duplicates work or misses history.
  • Label-triggered backport puts the decision on the author. If the author doesn't apply Backport to: latest-x.0, the fix doesn't reach release branches — failure mode is silent omission, caught only by the weekly reconciliation. Teams without a disciplined labelling culture will drift.
  • Draft-PR-on-conflict scales with conflict rate. If the private diff gets far enough from OSS main that most cherry-picks conflict, the backlog of conflict PRs becomes its own bottleneck. git-replay-style conflict memoisation was the previous generation's answer; the new bot doesn't describe whether it carries that forward.
  • PlanetScale uses its own product as bot state, which is a clean case study but not disclosed as a reliability requirement. An outage of the state DB (regardless of which vendor) would stall the bot until recovery.
  • Weekly reconciliation reports — not auto-remediate. The system posts discrepancies to a GitHub issue; humans have to read the issue and act. Scale-out of the reconciliation invariants would need either deeper auto-remediation or a dashboard / alerting integration not described in the post.
  • No conflict-resolution reuse story in the new bot. The post describes git-replay as the predecessor and says its limitations drove the rewrite, but does not describe whether the new bot inherits any of the memoised-resolutions mechanism or discards it. Conflict PRs still require human resolution.
  • No stated scale numbers. The post doesn't quote how many PRs flow through per week, conflict rate, database size, or API-call budget — the "remarkable" outcome is narrative rather than measured.

Source

Last updated · 319 distilled / 1,201 read