PATTERN Cited by 1 source

Benchmark in your own environment before upgrade¶

Intent¶

Before committing to a major-version upgrade of a stateful system, reproduce the benchmarks in your own environment with your own workload profile — don't trust published third-party numbers on their own.

Problem¶

Upstream benchmarks (vendor whitepapers, project release notes) are run on specific:

Data models
Query mixes
Cluster resources (instance type, disk, network)
Versions of the client / proxy / connector ecosystem

None of these are necessarily yours. A number like "34% faster streaming, 21–60% lower p99" (Cassandra 4.1 public claim) is a vendor-representative benchmark — your workload sits somewhere in that range, possibly outside it.

Worse, the sign of the delta isn't guaranteed. Cassandra 4.x improves many paths but Yelp found that Stargate 2.x regressed on range / multi-partition queries for their workload, a regression invisible in Cassandra-only benchmarks.

Solution¶

Spin up your own benchmark rig before committing to the upgrade:

Clusters with identical resources on both the old and new versions (same instance types, same disk class, same network).
Production-shaped workload — traffic profiles, query mix, data model that match what you actually run.
Measure the same quantities you care about for go/no-go: p99 / mean latency, throughput, resource utilization, tail behaviour of specific query classes.
Compare against the published numbers to confirm direction of travel, but base the decision on your own measurements.

Seen in¶

sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade — canonical wiki Seen-in. Yelp: "Performance benchmarks are highly dependent on the environment in which they are carried out; such as the data model, queries, cluster resources and more. So, it was essential to validate the expected performance improvements in our own operating environment. We spun up Cassandra 3.11 and 4.1 clusters with identical resources, configured traffic profiles to resemble our production workloads, and benchmarked the Cassandra clusters." Measured: 4% p99 improvement + 11% mean-latency improvement + 11% throughput improvement. This aligned with the DataStax whitepaper in direction of travel, which built confidence in the benchmark methodology. Production measured delta was much larger (up to 58% p99 reduction on key clusters) — the opposite problem: own-env benchmarks under-predicted the real-world win. Either direction of surprise is why you benchmark your own.

Why this isn't "just" due diligence¶

Identifies ecosystem regressions. Cassandra 4.1 benchmarks don't include Stargate 2.x — Yelp found the Stargate regression in production observability, not the upfront benchmark, but the upfront benchmark set the expectation bar that made the regression visible when it broke.
Calibrates rollback thresholds. "Rollback if p99 regresses > X%" requires X to be an environment-specific number, not a vendor-suggested number.
Builds internal confidence. Numbers your team ran against your own workload are arguable within the team in a way that vendor whitepaper numbers are not.

concepts/benchmark-representativeness — the underlying discipline.
concepts/benchmark-methodology-bias — the failure mode.
concepts/rolling-upgrade — canonical upgrade idiom where this pattern applies.
patterns/production-qualification-criteria-upfront — own-benchmark numbers feed into the go/no-go criteria.