Skip to content

CONCEPT Cited by 1 source

Region-split experiment

Definition

A region-split experiment applies an intervention to a whole set of geographic markets (the treated regions) and compares their outcome trajectory to a set of untreated markets (the control regions). Unlike user-split A/B, which randomises users, region-split randomises — or carefully selects — markets, letting the full market response (supply/demand rebalancing, pricing dynamics, user acquisition/retention at market level) show up in the measurement.

Region-split is the experiment shape of choice when the quantity of interest is end-to-end long-term effect including market-mediated effects — effects that route through shared market state and are invisible to user-split because treated and control users share the same market.

Why it's the only shape that observes market mediation

In a two-sided marketplace, a user-split treatment + control live in the same rider/driver pool, same price surface, same supply. Market-mediated channels (more drivers → less surge → rider retention ↑) affect treatment and control equally; the estimator differences them out, producing a biased-toward-zero estimate of the full long-term effect. A region-split avoids this by keeping the market itself as the randomisation unit: the treated market experiences the policy and all the market's reactions to it, the control market experiences neither, and the difference is the full causal effect — direct + mediated.

Why region-split is hard

Region-split experiments are widely known to be low-power and design-sensitive:

  • Region counts are small. A ridesharing platform has dozens or hundreds of markets, not millions. Sample size is bounded by geography, not by user volume.
  • Markets are heterogeneous. SF ≠ Phoenix ≠ Denver. Any treated/control split has substantial pre-period imbalance, which noise standard errors can't easily overcome.
  • Pre-intervention fit is critical. The most common region-split estimators (difference-in-differences, synthetic control) require the control-region trajectory to be a credible counterfactual for the treated-region trajectory. Poor pre-period parallel trends → biased estimates.
  • One-off treatments. Running one region-split per quarter caps the rate at which long-term causal evidence accumulates.

Lyft: "region-split experiments in general suffer from poor pre-intervention fit and low power."

Forward-selection design algorithm

Lyft's design-time response to the low-power problem: borrow from the forward difference-in-differences (FDiD) approach (Li, 2024) and greedily pick treated and control regions:

  1. Start with a single treated region.
  2. At each step, add the treated region that most improves pre-period fit of the treated-group average to a corresponding control-group average, and/or improves expected power.
  3. Iterate until the marginal addition stops improving the design.

See patterns/forward-selection-experiment-design for the full pattern. The upshot is a treated-vs-control design that's optimised per-experiment rather than accepting whatever randomly assigned split.

Architecture in the composed framework

In Lyft's surrogacy framework, the region-split is the end-to-end ground truth. Steps 1 + 2 (observational, cheap to re-estimate) produce a composed forecast of the intervention's long-term effect (direct + market-mediated). The region-split experiment — expensive, slow — validates or falsifies that composed forecast on a subset of policy shocks. When the observational forecast and the region-split disagree, the framework is iterated.

Seen in

Last updated · 319 distilled / 1,201 read