CONCEPT Cited by 1 source
Major-version upgrade discipline¶
Definition¶
Major-version upgrade discipline is the set of practices a team applies when upgrading a stateful datastore across a major-version boundary, where the upgrade is not a node-replacement-only operation — the application, client libraries, cluster topology, and query semantics may all shift together. Unlike minor / patch upgrades, a major upgrade can break queries that worked previously, change response shapes, deprecate or re-default settings, and introduce new index / storage formats that are not downgradeable.
Components¶
- Upgrade-strategy decision — choose between in-place rolling and reindex-based Blue/Green based on datastore properties, fleet cost tolerance, streaming time, and rollback requirements. See concepts/in-place-vs-new-dc-upgrade.
- Client-cluster decoupling — use a compatibility primitive (e.g. ES compat header, versioned REST API, feature flag) so the cluster and client can move independently. The 443k-LOC application rewrite should not be a blocker for a security-motivated cluster upgrade.
- Parity-test matrix against mixed versions — use patterns/multi-version-testcontainers-parity-suite to run the application's minimum usage surface against (old client × new cluster) and (new client × old cluster) combinations, beyond the (old, old) baseline which is production and the (new, new) target. Catches the majority of undocumented-behavior drift before cutover.
- Per-endpoint A/B dashboards during the shadow window — latency, error rate, index sizes / row counts. Cutover is data-driven, not schedule-driven.
- Almost-instantaneous rollback path — routing flip (Blue/ Green) or reverse rolling (rolling). Tested before cutover, not discovered during incident response.
- Small-cluster validation first — if the fleet is per-tenant or per-region, migrate a low-stakes cluster first, let the procedure mature, then roll to larger ones.
- Write-side convergence verification before cutover — index sizes on old and new converge monotonically over the shadow window. Zalando dashboards delta index sizes (Source: sources/2023-11-19-zalando-migrating-from-elasticsearch-7-to-8).
Seen in¶
- sources/2023-11-19-zalando-migrating-from-elasticsearch-7-to-8 — Canonical wiki instance at Elasticsearch 7.17→8.x for Zalando's Search & Browse department (28 per-country-language clusters, multi- terabyte each). All seven components explicitly present: (a) strategy chosen reindex Blue/Green over rolling; (b) compat header decouples Origami HLRC→Java-client migration from cluster upgrade; (c) three-class Testcontainers matrix catches four classes of undocumented-behavior drift; (d) Lightstep + Grafana side-by-side dashboards; (e) rollback via routing flip described as "almost instantaneous"; (f) less-mission-critical clusters migrated first; (g) index-size delta as the primary convergence SLI.
- sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade — in-place rolling counter-example for Cassandra 3.11→4.1, with its own pre-flight / flight / post-flight discipline (see patterns/pre-flight-flight-post-flight-upgrade-stages) and patterns/dual-run-version-specific-proxies.
- concepts/mysql-version-upgrade — cumulative guidance around MySQL major upgrades.
Anti-patterns¶
- Cluster upgrade blocked on full application rewrite — fix: compatibility-mode handoff.
- Cutover on schedule, not on data-convergence signal — fix: dashboard-gated cutover.
- Rollback plan not tested until needed — fix: run rollback drill on a non-customer-facing cluster before the real one.
- Tests derived from 7.x response shapes checked against 8.x cluster without updating — fix: regenerate golden-response fixtures against the new version first, then audit diffs.