Skip to content

CONCEPT Cited by 1 source

Mixed-version cluster

Definition

A mixed-version cluster is the steady-state a cluster is in during a rolling upgrade: some nodes run the old version, some run the new version, and the cluster as a whole continues to serve production traffic. This is not a fault state — it's the entire point of rolling upgrade — but it is a real, observable, operationally-distinct state that must be explicitly designed for.

Why it matters

A mixed-version cluster has semantics neither the pre-upgrade nor post-upgrade cluster has:

  1. Inter-version protocol negotiation — nodes on different versions communicate over whatever lowest-common subset of the replication / gossip / schema-change / repair protocols both understand. Protocol-level incompatibilities (e.g. schema-fetch mechanism changes — see Stargate / Cassandra 4.1 MigrationCoordinator) become load-bearing.
  2. Behavioural asymmetry — queries landing on new-version nodes may take advantage of new-version optimisations; queries on old-version nodes still look like the old cluster. Observed latency mixes the two distributions.
  3. Transient regressions are normal. Performance can degrade compared to either pure state and resolve on its own once all nodes flip. The regression signature is the cluster state, not a bug to fix forward.
  4. Mid-upgrade rollback is harder than it looks because the same mixed state operated the other direction.

Canonical wiki instance

Yelp's fleet-wide Cassandra 3.11 → 4.1 upgrade (Source: sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade) is the canonical wiki instance.

  • Elevated latency while mixed, resolved when homogenous: "In some cases, we also observed elevated latency while the Cassandra cluster contained a mix of 3.11 and 4.1 nodes. This was transient and resolved once all nodes were upgraded." See concepts/performance-regression-from-mid-upgrade-state.
  • Schema-fetch protocol asymmetry — 3.11 Stargate can't pull schema from a 4.1 Cassandra node, forcing two Stargate fleets run in parallel during the upgrade window (see patterns/dual-run-version-specific-proxies).
  • Forced ordering: the last 3.11 Cassandra node is kept on 3.11 until the 3.11-compatible proxy pool is drained — otherwise the 3.11 Stargate can't start. The mixed-version state is deliberately held longer than strictly required to enable a clean rollback and proxy drain.
  • CDC commit-log semantics diverge between 3.11 (write on flush) and 4.x (write on mutation — CASSANDRA-12148), forcing the downstream Cassandra Source Connector to be backward-compatible with both versions simultaneously for the duration of the mixed-version state.

Disciplines this implies

  • Observability instrumentation must be version-aware — per-version metric partitions so mid-upgrade signal can be read against the right baseline.
  • Test coverage must exercise the mixed-version topology, not just the two homogeneous endpoints. Yelp expanded acceptance-test coverage across both Stargate fleets during the Cassandra upgrade specifically for this reason.
  • Rollback paths must be verified under mixed-version load, not just under the pre-upgrade state.
Last updated · 476 distilled / 1,218 read