CONCEPT Cited by 1 source

Residualized regression¶

Definition¶

Residualised regression is the estimation stance of regressing deviations from a learned baseline on deviations of the explanatory variables from their baseline — rather than raw levels. The baseline captures predictable structure (time-of-day, day-of-week, seasonality, holidays, weather) so that the regression coefficient on the explanatory variable reflects its incremental effect on top of that structure.

Mechanically, residualisation is a two-stage regression:

Fit a baseline model ŷ = f(time, market_context, controls).
Regress the residual y - ŷ on the residual x - x̂ (where x̂ is the baseline prediction of x).

The slope in stage 2 is, under standard assumptions, identical to the multi-variable regression slope of y on x controlling for the baseline features (Frisch-Waugh-Lovell theorem). The point of doing it explicitly is interpretation and defensibility: the analyst can see that cyclic/seasonal patterns have been removed before the causal coefficient is read off.

Why Lyft uses it for Step 1 of surrogacy¶

In Lyft's framework (surrogacy), Step 1 estimates how a policy decision (e.g. rider price water level, driver incentive spend) affects short-term negative user experiences (wait time, surge, cancellations, driver earnings, idleness). These experiences are strongly cyclical:

Time-of-week effects (Friday night ≠ Tuesday morning)
Holidays (NYE surge ≠ random Tuesday)
Weather (rain boosts demand)
Shifting supply/demand (normal vs event-day)

A naive regression of wait_time on price_policy would pick up all of this cyclic/seasonal variation as if it were policy effect. Residualising on the market's own baseline — plus controlling for remaining supply/demand information — leaves the policy coefficient interpretable as an elasticity around everyday operating conditions, a "deviation from normal" reading. Lyft: "Because we measure everything as 'deviation from normal,' the effects read like elasticities around everyday operating conditions."

Shape of the output¶

Lyft's Step 1 residualised regression produces a calibrated response function — not just a point estimate of the mean effect, but a forecasted shift in the distribution of negative user experiences, with uncertainty. This matters because downstream Step 2 (surrogate → outcome) sees the full exposure distribution, not just the average; a policy that moves the mean wait time by a small amount but extends the tail substantially may have a very different long-term effect than one that moves the whole distribution uniformly.

Verification¶

Lyft verifies the residualised Step 1 model using switch-back experiments — the policy is alternated across comparable time slots and the observed lift in negative user experience is compared to the model's predicted lift. When the experiment disagrees with the model, the model is iterated (adding controls, changing the residualisation specification) until calibration holds.

Seen in¶

Lyft — Beyond A/B Testing (2026-03-25) — canonical wiki instance. Lyft's Foundational Models team uses residualised regression as the Step 1 estimator in its surrogacy framework, mapping policy decisions to short-term negative user experience distributions. Verified with switch-back experiments.

concepts/surrogacy-causal-inference — the framework this is the Step 1 estimator in.
concepts/switch-back-experiment — the verification experiment.
concepts/augmented-inverse-probability-weighting — the Step 2 estimator that consumes the calibrated response function.
patterns/surrogacy-two-step-ltv-estimation — the composed pattern.