CONCEPT Cited by 1 source
Lift metric¶
Definition¶
In an interleaving test of ranking A vs ranking B, the lift metric aggregates per-search (or per-user) preference into a single scalar.
wins_A= number of searches (or users) where ranking A accumulated more attributed events (e.g., clicks or bookings).wins_B= mirror.ties= searches / users with equal attribution to A and B.α ∈ [0, 1]is the tie weight — different conventions exist for how to normalise for ties. Expedia's post notes that "the results do not strongly depend on the normalization method."
Properties:
lift = 0⇒ no user preference between A and B (the null hypothesis being tested for significance).lift > 0⇒ users prefer A.lift < 0⇒ users prefer B.- The metric captures direction, not magnitude, of user preference — not to be confused with CVR uplift which measures absolute change in conversion rate.
Aggregation levels¶
Expedia reports at two levels:
- Per-search: each individual search produces a winning variant; aggregate across searches.
- Per-user (Expedia's default): bucket searches by user and let each user cast one vote — users with mixed wins or no preference count as ties. Reduces the risk that a handful of heavy-searcher users dominate the metric.
Per-event-type split¶
Expedia tracks two lift metrics independently:
- Click lift — based on property-detail-page views (click-through). Denser, higher-frequency; detects faster.
- Booking lift — based on completed booking transactions. Rarer; closer to revenue; detects slower.
Reporting both "improves our understanding of the impact of rankings to both conversion and click-through rates."
Significance testing¶
lift = 0 is the null hypothesis. To decide whether an observed lift is
distinguishable from zero:
- Bootstrap percentile method — non-parametric, slow.
- t-test on winning indicators — parametric, fast, "virtually the same results" at production scale.
Caveats¶
- Normalisation choice is a hyperparameter. Different conventions for
α(tie weight) can shift magnitude but Expedia reports that direction and significance are robust. - User-level reporting requires user attribution. Logged-out traffic with unstable identifiers pollutes the user-bucketing step.
- Lift is not comparable across experiments with different baseline ranking quality or different query mixes; it's a within-experiment directional signal.
- Lift is not CVR uplift. A lift of +0.1 doesn't mean 10 % more CVR; launch decisions need A/B rollouts for the absolute number.
Seen in¶
- sources/2026-02-17-expedia-interleaving-for-accelerated-testing — Expedia Group's lodging-search interleaving framework; user-level reporting is the default; clicks and bookings tracked with independent lift metrics.
Related¶
- concepts/interleaving-testing — the technique whose output is the lift metric.
- concepts/winning-indicator-t-test — the fast significance test.
- concepts/bootstrap-percentile-method — the slow non-parametric baseline.
- concepts/conversion-rate-uplift — the magnitude metric A/B testing reports; lift is its direction-only counterpart.
- patterns/interleaved-ranking-evaluation — the end-to-end pattern that produces the lift metric.