PATTERN Cited by 1 source
Dual-run version-specific proxies¶
Intent¶
During a major-version upgrade of a datastore whose data proxy is pinned to a specific major version of the datastore, run two parallel proxy fleets — one per major version — registered under the same service-mesh alias so clients see a single endpoint. Drain the old proxy fleet only after all-but-one datastore node have moved to the new version.
Problem¶
Some data proxies can't span a major version boundary because
they fetch schema from the datastore at startup (or
continuously), and the schema-fetch mechanism changed between
versions. The canonical example is
Stargate: the 3.11 Stargate
cannot pull schema from a 4.1
Cassandra node because Cassandra 4.1's MigrationCoordinator
behaves differently.
If there's only one proxy fleet, you're forced into one of:
- Upgrade proxy first, datastore second — but then the proxy can't talk to any datastore nodes until the datastore is fully upgraded.
- Upgrade datastore first, proxy second — but then the proxy can't talk to any datastore nodes once the first node flips.
Either direction breaks production traffic.
Solution¶
Run two proxy fleets simultaneously during the upgrade window:
- Old-version proxy fleet — pinned to the old datastore major version; seed list points at an old-version node.
- New-version proxy fleet — pinned to the new datastore major version; seed list points at a new-version node.
- Single service-mesh alias fronts both fleets so clients see one endpoint; the mesh can route to either instance.
As the datastore upgrade progresses through the flight stage:
- Upgrade one datastore node to new version.
- Spin up the new-version proxy fleet — now both proxy fleets are running.
- Monitor per-fleet, per-keyspace p99 latency and error rate to catch regressions early.
- Upgrade remaining datastore nodes except the last one — the last one is deliberately held on the old version so the old-version proxy fleet's seed can still start.
- Drain the old-version proxy fleet. No more old-version proxies.
- Upgrade the last datastore node.
Structure¶
Clients
│
▼
Service-mesh alias: "cassandra-gateway"
│
├────────► Stargate fleet v3.11 ──► seed = Cassandra-3.11 node
└────────► Stargate fleet v4.1 ──► seed = Cassandra-4.1 node
Cassandra cluster (rolling through flight stage):
[3.11, 3.11, 3.11, 3.11, 3.11] ← start
[4.1, 3.11, 3.11, 3.11, 3.11] ← one node flipped, new proxy spins up
[4.1, 4.1, 4.1, 4.1, 3.11] ← deliberately hold one 3.11 node
drain v3.11 proxy fleet now
[4.1, 4.1, 4.1, 4.1, 4.1 ] ← all flipped
Test-coverage implication¶
Running two proxy fleets simultaneously means clients can hit either path depending on mesh routing. Acceptance-test coverage must exercise both paths to catch API-surface deltas across the major versions. Yelp explicitly "expanded our acceptance test coverage across all services" during this upgrade for exactly this reason.
Trade-offs¶
- Double proxy footprint during the upgrade window.
- Mesh routing must be stable — clients must not see spurious identity differences between the two fleets.
- Seed-list discipline is load-bearing — each fleet's seed pool must point at the matching datastore version.
- The last-node-on-old-version gate is non-obvious and easy to miss when automating the upgrade.
Seen in¶
- sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade — canonical wiki Seen-in. Yelp's Cassandra 3.11 → 4.1 upgrade across > 1,000 nodes. Direct quote: "Ultimately, we opted for version-specific Stargate instances, each relying on the corresponding version of the Cassandra persistence layer. During this process, we ensured that the seed list of the proxy always pointed to a Cassandra node running the matching major version." The kept-on-old-version last node gate is named explicitly in the flight-stage sequence diagram.
Related¶
- systems/stargate-cassandra-proxy — canonical proxy.
- systems/apache-cassandra — canonical datastore.
- concepts/mixed-version-cluster — the cluster state this pattern extends across the proxy tier.
- concepts/rolling-upgrade — the upgrade idiom.
- patterns/pre-flight-flight-post-flight-upgrade-stages — the flight stage this pattern is inside.