PATTERN Cited by 1 source
Version-specific images per Git branch¶
Intent¶
During a major version migration of a fleet-wide system, ship both old and new major versions as parallel, independently deployable artifacts by publishing version-specific images from dedicated Git branches, and select which image each instance runs at bootstrap via an environment variable.
This lets the migration proceed cluster-by-cluster, pause indefinitely at any point, and roll clusters forward or back independently of client code or of each other.
Problem¶
A major-version bump of a core system (datastore, service framework, runtime) creates a hard block if there is only one current image:
- Every cluster must be upgraded before any new feature needs new-version support.
- Rollback reverts the whole fleet.
- Client teams are dragged into the migration window because their dependency version floats with the platform.
Adding a dependency on client teams "increases project complexity, potentially leading to a long tail of migrations" (Yelp's phrasing, Source: sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade).
Solution¶
- Maintain a Git branch per major version (e.g.
cassandra-3.11,cassandra-4.1). - Apply fixes to the appropriate branch(es); during the upgrade window, any 3.11 hotfix must also be ported to 4.1.
- Publish per-branch Docker images (or equivalent artifact).
- Select the image per cluster (or per instance) at bootstrap via a version-specific environment variable read by the deploy tooling.
The migration now proceeds per cluster:
- Flip the env var on cluster N to point at the new version.
- Deploy restarts cluster N's nodes one at a time with the new image.
- Cluster N+1 stays on old version until its turn.
Rollback is symmetric: flip the env var back, restart.
Structure¶
Git repo
├── branch: cassandra-3.11
│ └── Dockerfile builds cassandra-3.11:latest image
├── branch: cassandra-4.1
│ └── Dockerfile builds cassandra-4.1:latest image
Cluster configuration
└── environment variable CASSANDRA_VERSION_TAG=cassandra-3.11 | cassandra-4.1
Deploy tooling
└── pulls $CASSANDRA_VERSION_TAG at bootstrap time
Trade-offs¶
- Critical-fix duplication: during the upgrade window, any fix to the old branch must also land on the new branch. Yelp's rationale: "critical fixes for Cassandra 3.11 [were] expected to be rare" during the window, so the overhead was acceptable.
- Artifact sprawl: more images to build, test, and store. Mitigate by keeping the window bounded.
- Testing matrix doubles across client combinations for the duration of the upgrade.
Seen in¶
- sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade — canonical wiki Seen-in. Yelp ran both Cassandra 3.11 and 4.1 images side by side across their > 1,000-node fleet during the upgrade window. Direct quote: "we achieved this by publishing version-specific Cassandra images from dedicated Git branches. The appropriate Cassandra image was selected at bootstrap time via version-specific environment variables." Avoiding "hard-blocking ourselves during the upgrade" was an explicit core principle.
Related¶
- concepts/rolling-upgrade — the upgrade idiom this pattern supports.
- concepts/mixed-version-cluster — the operational state this pattern makes safe.
- patterns/feature-flagged-dual-implementation — the application-tier analogue (flag-gated old + new implementations). This pattern is the platform-tier analogue at the image/artifact level.
- patterns/pre-flight-flight-post-flight-upgrade-stages — uses this pattern as substrate.