CONCEPT Cited by 1 source
Conversion-rate uplift¶
Definition¶
Conversion-rate uplift (CVR uplift) is the standard e-commerce / search-ranking A/B testing metric: the absolute or relative difference in conversion rate between the treatment cohort and the control cohort.
CVR = conversions / impressions
CVR uplift = CVR_treatment − CVR_control (absolute)
or
(CVR_treatment − CVR_control) / CVR_control (relative)
"Conversion" here is usually the business-closest event — a booking, purchase, subscription, sign-up — whichever is the product's revenue action.
Why it's the default launch metric¶
- Directly interpretable by product and business stakeholders: "this change increases bookings by 2.3 %."
- Tied to revenue. A positive CVR uplift translates (modulo average order value effects) directly to revenue.
- Comparable across experiments on the same product surface.
- Well-understood statistics — two-sample t-test or chi-squared on proportions; standard significance-testing toolkits apply.
Why it's low-sensitivity for subtle ranking changes¶
The CVR uplift measurement compares two independent cohorts of users. Every user's individual engagement variance (heavy-clicker vs light-clicker, repeat-customer vs first-time, high-intent vs browser) contributes to the denominator of the effect-size calculation — drowning small ranking differences in noise.
Expedia's 2026-02-17 post gives a worked case: a treatment that pins a random property to a slot between positions 5 and 10 — a real but small ranking regression — is undetectable by A/B CVR uplift even at full sample size. The same treatment is detected by interleaving within a few days, because interleaving eliminates between-user variance via paired within-user comparison.
The complementary pair with interleaving¶
| CVR uplift (A/B) | Lift (interleaving) | |
|---|---|---|
| Measures | Absolute change in CVR | Direction of preference |
| Assumes | Independent user cohorts | Interleavable rankings |
| Sensitivity | Low on subtle changes | High |
| Magnitude | Yes | No |
| Revenue proximity | Direct | Indirect |
| Role | Launch-decision metric | Screening metric |
Expedia's practice is to use interleaving to screen candidate ranking changes (fast, sensitive, direction-only) and A/B CVR uplift to validate launch (slow, magnitude-bearing). Interleaving doesn't replace CVR uplift; it sits upstream of it.
Measurement nuances¶
- Numerator definition — bookings? completed bookings? post-refund bookings? The choice shifts results and interacts with fraud / cancellation rates.
- Denominator definition — all impressions? unique users? unique sessions? Session-level and user-level CVR move differently.
- Attribution window — is a booking one hour, one day, or one week after the search counted as conversion? Longer windows raise CVR but add noise.
- Novelty effects — new-ranker CVR can temporarily spike or dip from user curiosity; wait for the effect to settle.
Caveats¶
- CVR uplift is noisy at low booking rates. For products with few percent conversion, tens of thousands of exposures per arm are needed to detect single-digit relative uplifts.
- CVR uplift can be dominated by the wrong segment. Power users with many sessions dilute new-user CVR effects. Stratified analysis or user-level metrics help.
- A launch-positive CVR uplift can still be a user-experience regression if it's caused by deceptive ranking boosts (e.g., pushing expensive properties that convert but anger users long-term). Interleaving catches some of these by measuring preference, not just conversion.
- CVR uplift ≠ revenue uplift. Changes to average basket size, average price, or cancellation rate can flip the sign.
Seen in¶
- sources/2026-02-17-expedia-interleaving-for-accelerated-testing — Expedia's "A/B testing on CVR uplift" as the baseline against which interleaving's sensitivity is compared; fails to detect random-property-pinning regression even at full sample size.
Related¶
- concepts/interleaving-testing — the sensitivity-trading alternative.
- concepts/lift-metric — the direction-only counterpart.
- concepts/test-sensitivity — the axis on which CVR uplift is weak.
- patterns/ab-test-rollout — the pattern in which CVR uplift is the default launch metric.
- concepts/customer-driven-metrics — the broader question of which metric to optimise.