CONCEPT Cited by 1 source
Mixed-version cluster¶
Definition¶
A mixed-version cluster is the steady-state a cluster is in during a rolling upgrade: some nodes run the old version, some run the new version, and the cluster as a whole continues to serve production traffic. This is not a fault state — it's the entire point of rolling upgrade — but it is a real, observable, operationally-distinct state that must be explicitly designed for.
Why it matters¶
A mixed-version cluster has semantics neither the pre-upgrade nor post-upgrade cluster has:
- Inter-version protocol negotiation — nodes on different
versions communicate over whatever lowest-common subset of
the replication / gossip / schema-change / repair protocols
both understand. Protocol-level incompatibilities
(e.g. schema-fetch mechanism changes — see
Stargate /
Cassandra 4.1
MigrationCoordinator) become load-bearing. - Behavioural asymmetry — queries landing on new-version nodes may take advantage of new-version optimisations; queries on old-version nodes still look like the old cluster. Observed latency mixes the two distributions.
- Transient regressions are normal. Performance can degrade compared to either pure state and resolve on its own once all nodes flip. The regression signature is the cluster state, not a bug to fix forward.
- Mid-upgrade rollback is harder than it looks because the same mixed state operated the other direction.
Canonical wiki instance¶
Yelp's fleet-wide Cassandra 3.11 → 4.1 upgrade (Source: sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade) is the canonical wiki instance.
- Elevated latency while mixed, resolved when homogenous: "In some cases, we also observed elevated latency while the Cassandra cluster contained a mix of 3.11 and 4.1 nodes. This was transient and resolved once all nodes were upgraded." See concepts/performance-regression-from-mid-upgrade-state.
- Schema-fetch protocol asymmetry — 3.11 Stargate can't pull schema from a 4.1 Cassandra node, forcing two Stargate fleets run in parallel during the upgrade window (see patterns/dual-run-version-specific-proxies).
- Forced ordering: the last 3.11 Cassandra node is kept on 3.11 until the 3.11-compatible proxy pool is drained — otherwise the 3.11 Stargate can't start. The mixed-version state is deliberately held longer than strictly required to enable a clean rollback and proxy drain.
- CDC commit-log semantics diverge between 3.11 (write on
flush) and 4.x (write on mutation —
CASSANDRA-12148), forcing the downstream Cassandra Source Connector to be backward-compatible with both versions simultaneously for the duration of the mixed-version state.
Disciplines this implies¶
- Observability instrumentation must be version-aware — per-version metric partitions so mid-upgrade signal can be read against the right baseline.
- Test coverage must exercise the mixed-version topology, not just the two homogeneous endpoints. Yelp expanded acceptance-test coverage across both Stargate fleets during the Cassandra upgrade specifically for this reason.
- Rollback paths must be verified under mixed-version load, not just under the pre-upgrade state.
Related¶
- concepts/rolling-upgrade — the upgrade idiom that produces this state.
- concepts/performance-regression-from-mid-upgrade-state — the canonical observable of a mixed-version cluster.
- concepts/schema-disagreement — a failure mode that can crystallise during or immediately after the mixed-version window.
- patterns/pre-flight-flight-post-flight-upgrade-stages — the disciplined shape of the upgrade that passes through this state.
- patterns/dual-run-version-specific-proxies — the coping pattern for proxies that can't span both versions.