PATTERN Cited by 1 source

Surrogacy two-step LTE estimator¶

Problem¶

In a multi-sided marketplace, estimating the long-term effect (LTE) of a resource-allocation decision (pricing, incentive budget reallocation) is structurally hard:

Market mediation (concepts/market-mediated-long-term-effects) means that a user-split A/B cannot observe the full effect.
Region-split experiments (concepts/region-split-experiment) can observe the full effect but are low-power and slow.
Long-horizon experiments are costly in business opportunity, drift, and attrition, and cannot be run at decision-planning cadence.

The organisation needs a framework that (a) can be updated cheaply as the market evolves, and (b) is still trustable for budget- and pricing-level decisions.

Pattern¶

Decompose the long-term effect into a surrogacy framework (concepts/surrogacy-causal-inference) — two sequential observational causal-inference steps, each independently verifiable by its own purpose-built experiment — and verify the composed forecast end-to-end with occasional region-split experiments.

Step 1: Intervention → short-term surrogate¶

Estimator: residualised regression on deviations from a learned baseline that removes cyclical / seasonal / contextual variation. Models the intervention's effect on the distribution (not just mean) of short-term surrogates — Lyft uses negative user experiences (wait time, surge, cancellations, driver earnings, idleness).

Verifier: switch-back — alternate the intervention across comparable time slots in a single market, compare modelled lift in surrogate to observed lift. Failure triggers model iteration (additional controls, respecification).

Step 2: Surrogate → long-term outcome¶

Estimator: AIPW — doubly-robust observational causal inference with a propensity model for surrogate exposure and an outcome model for future behaviour. Produces treatment effects of each surrogate level on the long-term outcome, summarised into a surrogacy index that scales short-term exposure to long-term impact.

Verifier: user-split — randomise users to different levels of surrogate exposure (e.g. experimentally worsen wait time for a subset), compare predicted to observed change in future outcomes (retention, future rides). Market mediation is small within the randomised population, so user-split is the right shape here even in a marketplace.

Composition¶

Combine the Step-1 calibrated response function with the Step-2 surrogacy index to produce a single long-term effect forecast for any policy shock. In Lyft's framing, add the direct long-term effect (estimated separately) to this market-mediated forecast via "a transparent formula grounded in market mechanics."

End-to-end verifier¶

Periodically run a region-split experiment — apply the shock to treated regions, compare to control regions — to ground-truth the composed forecast on real market-mediated effects. Use a forward-selection algorithm to pick treated/control regions for pre-period fit and power.

Why this composition works¶

Cheap refresh cycle. Both observational steps run on recent data and can be re-estimated as the market evolves, without waiting for new experimental evidence.
Each estimator is verifiable. Steps 1 and 2 each have an experiment shape that catches their own failure mode: switch-back catches Step 1 misspecification, user-split catches Step 2 calibration drift, region-split catches composition / assumption violations.
No single experiment needs to do everything. The post is explicit: "there is no single form of experiment that can provide a perfect verification; therefore, we have to combine multiple imperfect signals." Each experiment class is matched to the horizon / interference profile it can credibly observe.
Region-split is reserved for high-value verifications. Its cost is amortised across many observational re-estimates running between region-split cycles.

Trade-offs and limits¶

Mediation assumption is load-bearing. The whole framework assumes the surrogate fully mediates the intervention's effect on the outcome. Channels routing around the surrogate (brand, trust, slow competitive response) are invisible to Steps 1+2 and only caught by region-split. Choose surrogates that plausibly capture the dominant mediation channels.
Unobserved confounders bias Step 2. AIPW is doubly robust against functional-form misspecification, not against missing-variable misspecification. Rich covariate sets matter.
Three experiment types to operate. Switch-back requires in-market policy volatility; user-split requires intentional experience degradation on a subset of users; region-split requires multi-market coordination. Each has its own operator cost, failure mode, and business-continuity risk.
No numerical outcome disclosed. Lyft's 2026-03-25 post is methodology-only; it does not report the framework's forecast error, calibration drift rate, or dollar accuracy on past decisions.

Seen in¶

Lyft — Beyond A/B Testing (2026-03-25) — canonical wiki instance. Lyft's Foundational Models team uses this composed framework for budget- and pricing-level decisions ("How should we allocate budget between driver incentives and rider incentives? … How do we invest resources to achieve x% rides growth, and how much does it cost in terms of short term profit?").

concepts/surrogacy-causal-inference — the design stance.
concepts/market-mediated-long-term-effects — the phenomenon being measured.
concepts/residualized-regression — the Step 1 estimator.
concepts/augmented-inverse-probability-weighting — the Step 2 estimator.
concepts/switch-back-experiment / concepts/user-split-experiment / concepts/region-split-experiment — the three verifiers.
patterns/forward-selection-experiment-design — the treated/control-region design algorithm for the region-split verifier.