CONCEPT Cited by 1 source
User-split experiment¶
Definition¶
A user-split experiment is classical A/B testing at the user level: randomise users into treatment and control groups, apply different variants of a feature / policy / prompt to each group, and compare outcomes. It is the default online-experiment shape across tech companies and the one most "A/B test" tutorials describe.
User-split is maximally powerful per randomisation unit — a large internet platform can randomise millions of users per experiment, producing tight confidence intervals on individual- level outcomes.
When user-split works¶
User-split recovers unbiased treatment effects when the SUTVA (Stable Unit Treatment Value Assumption) holds: a user's outcome depends only on their own treatment assignment, not on anyone else's. SUTVA holds, approximately, in settings where users don't interact through shared state — e.g. search-result relevance, single-player UI layout, spam filter accuracy.
When user-split fails: marketplace interference¶
SUTVA fails in multi-sided marketplaces (ridesharing, delivery, lodging, ads) and other systems with interference — users in one group affect outcomes in the other group through shared state. Example: if Lyft increases driver incentive spend for a treatment group of drivers, those drivers drive more → market-wide supply increases → wait times and surge drop for all riders → control drivers see shorter queues and lower per-ride earnings. The control group is not actually counterfactual for the treatment; they experience a different market because the treatment changed the market.
The consequence: user-split estimators systematically underestimate (or bias, if counterbalancing channels exist) the full effect of an intervention in a marketplace, because they silently differenced out the market-mediated effects the intervention also caused.
Where user-split is still useful in marketplace methodology¶
User-split remains useful for individual-level mediator-to-outcome estimation, which is exactly how Lyft uses it in Step 2 of its surrogacy framework. The argument:
- Step 2 asks: given an individual user faces a particular level of short-term negative experience (surge, wait, cancellation), how does their future behaviour change?
- At the individual level, within a randomised population sharing the same market, market-mediated feedback is small: treatment and control users see the same market because they're mixed together.
- The difference between their future outcomes therefore reflects the individual-level effect of the mediator on behaviour, which is the Step 2 quantity.
Lyft: "We run user‑split experiments that perturb negative experiences and compare the model's predicted changes in future outcomes to the experimental lifts, checking calibration (predicted vs. observed) for validation."
Where user-split must not be used in marketplace methodology¶
User-split cannot validate the end-to-end, market-mediated long-term effect. Lyft uses region-split experiments for that — they keep the market itself as the randomisation unit so the market response is observable.
Comparison¶
| Experiment | Randomisation unit | Observes market mediation | Power | Long-term outcomes |
|---|---|---|---|---|
| User-split | user | ❌ no | ✅ high | ✅ (long windows) |
| Switch-back | time slot | ✅ yes (within slot) | medium | ❌ no (slots too short) |
| Region-split | region | ✅ yes (whole market) | ⚠️ low | ✅ yes |
Seen in¶
- Lyft — Beyond A/B Testing (2026-03-25) — canonical wiki instance of the why-user-split-is-insufficient-for-marketplace-LTE argument and of the user-split-verifies-Step-2-of-surrogacy use.
Related¶
- concepts/switch-back-experiment — time-based sibling.
- concepts/region-split-experiment — region-based sibling, used when market mediation must be observed.
- concepts/market-mediated-long-term-effects — the phenomenon that breaks user-split SUTVA in marketplaces.
- concepts/surrogacy-causal-inference — the framework that uses user-split specifically for Step 2 verification.
- concepts/augmented-inverse-probability-weighting — the Step 2 estimator that user-split verifies.