CONCEPT Cited by 1 source

Lift metric¶

Definition¶

In an interleaving test of ranking A vs ranking B, the lift metric aggregates per-search (or per-user) preference into a single scalar.

lift = (wins_A − wins_B) / (wins_A + wins_B + α · ties)

wins_A = number of searches (or users) where ranking A accumulated more attributed events (e.g., clicks or bookings).
wins_B = mirror.
ties = searches / users with equal attribution to A and B.
α ∈ [0, 1] is the tie weight — different conventions exist for how to normalise for ties. Expedia's post notes that "the results do not strongly depend on the normalization method."

Properties:

lift = 0 ⇒ no user preference between A and B (the null hypothesis being tested for significance).
lift > 0 ⇒ users prefer A.
lift < 0 ⇒ users prefer B.
The metric captures direction, not magnitude, of user preference — not to be confused with CVR uplift which measures absolute change in conversion rate.

Expedia reports at two levels:

Per-search: each individual search produces a winning variant; aggregate across searches.
Per-user (Expedia's default): bucket searches by user and let each user cast one vote — users with mixed wins or no preference count as ties. Reduces the risk that a handful of heavy-searcher users dominate the metric.

Expedia tracks two lift metrics independently:

Click lift — based on property-detail-page views (click-through). Denser, higher-frequency; detects faster.
Booking lift — based on completed booking transactions. Rarer; closer to revenue; detects slower.

Reporting both "improves our understanding of the impact of rankings to both conversion and click-through rates."

lift = 0 is the null hypothesis. To decide whether an observed lift is distinguishable from zero:

Bootstrap percentile method — non-parametric, slow.
t-test on winning indicators — parametric, fast, "virtually the same results" at production scale.

Normalisation choice is a hyperparameter. Different conventions for α (tie weight) can shift magnitude but Expedia reports that direction and significance are robust.
User-level reporting requires user attribution. Logged-out traffic with unstable identifiers pollutes the user-bucketing step.
Lift is not comparable across experiments with different baseline ranking quality or different query mixes; it's a within-experiment directional signal.
Lift is not CVR uplift. A lift of +0.1 doesn't mean 10 % more CVR; launch decisions need A/B rollouts for the absolute number.

sources/2026-02-17-expedia-interleaving-for-accelerated-testing — Expedia Group's lodging-search interleaving framework; user-level reporting is the default; clicks and bookings tracked with independent lift metrics.

concepts/interleaving-testing — the technique whose output is the lift metric.
concepts/winning-indicator-t-test — the fast significance test.
concepts/bootstrap-percentile-method — the slow non-parametric baseline.
concepts/conversion-rate-uplift — the magnitude metric A/B testing reports; lift is its direction-only counterpart.
patterns/interleaved-ranking-evaluation — the end-to-end pattern that produces the lift metric.