Skip to content

CONCEPT Cited by 1 source

Major-version upgrade discipline

Definition

Major-version upgrade discipline is the set of practices a team applies when upgrading a stateful datastore across a major-version boundary, where the upgrade is not a node-replacement-only operation — the application, client libraries, cluster topology, and query semantics may all shift together. Unlike minor / patch upgrades, a major upgrade can break queries that worked previously, change response shapes, deprecate or re-default settings, and introduce new index / storage formats that are not downgradeable.

Components

  1. Upgrade-strategy decision — choose between in-place rolling and reindex-based Blue/Green based on datastore properties, fleet cost tolerance, streaming time, and rollback requirements. See concepts/in-place-vs-new-dc-upgrade.
  2. Client-cluster decoupling — use a compatibility primitive (e.g. ES compat header, versioned REST API, feature flag) so the cluster and client can move independently. The 443k-LOC application rewrite should not be a blocker for a security-motivated cluster upgrade.
  3. Parity-test matrix against mixed versions — use patterns/multi-version-testcontainers-parity-suite to run the application's minimum usage surface against (old client × new cluster) and (new client × old cluster) combinations, beyond the (old, old) baseline which is production and the (new, new) target. Catches the majority of undocumented-behavior drift before cutover.
  4. Per-endpoint A/B dashboards during the shadow window — latency, error rate, index sizes / row counts. Cutover is data-driven, not schedule-driven.
  5. Almost-instantaneous rollback path — routing flip (Blue/ Green) or reverse rolling (rolling). Tested before cutover, not discovered during incident response.
  6. Small-cluster validation first — if the fleet is per-tenant or per-region, migrate a low-stakes cluster first, let the procedure mature, then roll to larger ones.
  7. Write-side convergence verification before cutover — index sizes on old and new converge monotonically over the shadow window. Zalando dashboards delta index sizes (Source: sources/2023-11-19-zalando-migrating-from-elasticsearch-7-to-8).

Seen in

Anti-patterns

  • Cluster upgrade blocked on full application rewrite — fix: compatibility-mode handoff.
  • Cutover on schedule, not on data-convergence signal — fix: dashboard-gated cutover.
  • Rollback plan not tested until needed — fix: run rollback drill on a non-customer-facing cluster before the real one.
  • Tests derived from 7.x response shapes checked against 8.x cluster without updating — fix: regenerate golden-response fixtures against the new version first, then audit diffs.
Last updated · 501 distilled / 1,218 read