PATTERN Cited by 1 source
Forward-selection experiment design¶
Problem¶
Region-split experiments (concepts/region-split-experiment) are the only shape that can observe the full long-term effect of an intervention in a multi-sided marketplace, including market-mediated effects. But they have two chronic weaknesses:
- Low power. Geographic units are scarce; sample-size arguments that make user-split A/B feel bulletproof don't translate to region-split.
- Poor pre-intervention fit. Markets are heterogeneous; a naively assigned treated/control split usually has a pre-period parallel-trends violation that biases the estimator.
Random assignment of regions is power-limited and design-sensitive. The design problem is: given a set of candidate regions, pick the treated regions and control regions to maximise the probability that the experiment will resolve the quantity of interest with acceptable precision.
Pattern¶
Borrow from the forward difference-in-differences (FDiD) literature (Li, 2024) and perform a greedy, pre-period-fit-aware selection of treated and control regions before running the experiment:
- Start with a single treated region. Typically chosen by business priority (a market where the intervention most needs to be validated) or by optimal-first-pick selection.
- Iterate. At each step, consider adding a candidate region to the treated group; the criterion is a combination of:
- How much it improves pre-period fit between the treated-group average and a corresponding control-group average (analogue of synthetic-control weight matching).
- How much it improves expected power for the final treatment effect estimator.
- Add the best candidate. Update treated group accordingly.
- Stop when marginal addition no longer improves the design, or when budget / operational constraints bind.
- The control group is selected by the same process in reverse, or jointly: the control regions are those whose pre-period trajectory best mimics the treated-group average.
The resulting treated and control sets are jointly optimised for the specific experiment being run — not a one-size-fits-all split.
Lyft's framing¶
Lyft: "we developed a forward selection algorithm to optimize experiment design by picking the treated and control regions …inspired by the forward difference-in-differences (FDiD, Li, 2024) approach … starting from a single treated region, we iteratively add treated regions that best improve pre-period fit and expected power."
Why forward selection over synthetic control¶
Standard synthetic control methods build a weighted combination of control units whose pre-period trajectory matches a single treated unit. Forward selection has different properties:
- Multi-treated-unit by construction. Synthetic control typically fits one treated unit at a time; forward selection grows the treated group iteratively so that the whole treatment arm is covered by a single matched control.
- Power-aware. The selection criterion includes expected power for the treatment-effect estimator, not just pre-period fit.
- Greedy is often enough. Combinatorial optimality isn't required — greedy additions give substantial improvement over random assignment with bounded computational cost.
The trade-off is that forward selection is a heuristic; it gives up theoretical guarantees of an exhaustive search in exchange for tractability. As with all experiment-design heuristics, the final validity of the experiment depends on the pre-analysis plan being committed before the experiment runs.
Related design choices¶
- Pre-registration. The selection must happen before the intervention so the picked regions are not contaminated by post-treatment information. Best practice: freeze selection + analysis plan, then execute.
- Selection bias check. If the selected treated regions differ systematically from the candidate set in ways that correlate with treatment response (e.g. always picking high-growth regions), the estimate won't generalise. Run the final estimator on held-out candidate regions as a sanity check if feasible.
- Pair with observational estimators. Forward-selected region-splits are expensive and infrequent; pair them with an observational estimator that runs continuously (Lyft pairs region-split with the two-step surrogacy estimator) so the region-split is used to ground-truth and calibrate, not for primary estimation.
Seen in¶
- Lyft — Beyond A/B Testing (2026-03-25) — canonical wiki instance. Lyft's Foundational Models team uses a forward-selection algorithm (inspired by FDiD, Li 2024) to pick treated and control regions for the region-split experiments that end-to-end-verify its surrogacy-based long-term-effect forecasts.
Related¶
- concepts/region-split-experiment — the experiment shape this is the design-time optimiser for.
- patterns/surrogacy-two-step-ltv-estimation — the observational framework this pattern provides ground-truth verification for.
- concepts/market-mediated-long-term-effects — the phenomenon that motivates using region-split (and therefore this pattern) in the first place.
- concepts/surrogacy-causal-inference — the upstream observational framework.