CONCEPT Cited by 1 source

User-split experiment¶

Definition¶

A user-split experiment is classical A/B testing at the user level: randomise users into treatment and control groups, apply different variants of a feature / policy / prompt to each group, and compare outcomes. It is the default online-experiment shape across tech companies and the one most "A/B test" tutorials describe.

User-split is maximally powerful per randomisation unit — a large internet platform can randomise millions of users per experiment, producing tight confidence intervals on individual- level outcomes.

When user-split works¶

User-split recovers unbiased treatment effects when the SUTVA (Stable Unit Treatment Value Assumption) holds: a user's outcome depends only on their own treatment assignment, not on anyone else's. SUTVA holds, approximately, in settings where users don't interact through shared state — e.g. search-result relevance, single-player UI layout, spam filter accuracy.

When user-split fails: marketplace interference¶

SUTVA fails in multi-sided marketplaces (ridesharing, delivery, lodging, ads) and other systems with interference — users in one group affect outcomes in the other group through shared state. Example: if Lyft increases driver incentive spend for a treatment group of drivers, those drivers drive more → market-wide supply increases → wait times and surge drop for all riders → control drivers see shorter queues and lower per-ride earnings. The control group is not actually counterfactual for the treatment; they experience a different market because the treatment changed the market.

The consequence: user-split estimators systematically underestimate (or bias, if counterbalancing channels exist) the full effect of an intervention in a marketplace, because they silently differenced out the market-mediated effects the intervention also caused.

Where user-split is still useful in marketplace methodology¶

User-split remains useful for individual-level mediator-to-outcome estimation, which is exactly how Lyft uses it in Step 2 of its surrogacy framework. The argument:

Step 2 asks: given an individual user faces a particular level of short-term negative experience (surge, wait, cancellation), how does their future behaviour change?
At the individual level, within a randomised population sharing the same market, market-mediated feedback is small: treatment and control users see the same market because they're mixed together.
The difference between their future outcomes therefore reflects the individual-level effect of the mediator on behaviour, which is the Step 2 quantity.

Lyft: "We run user‑split experiments that perturb negative experiences and compare the model's predicted changes in future outcomes to the experimental lifts, checking calibration (predicted vs. observed) for validation."

Where user-split must not be used in marketplace methodology¶

User-split cannot validate the end-to-end, market-mediated long-term effect. Lyft uses region-split experiments for that — they keep the market itself as the randomisation unit so the market response is observable.

Comparison¶

Experiment	Randomisation unit	Observes market mediation	Power	Long-term outcomes
User-split	user	❌ no	✅ high	✅ (long windows)
Switch-back	time slot	✅ yes (within slot)	medium	❌ no (slots too short)
Region-split	region	✅ yes (whole market)	⚠️ low	✅ yes

Seen in¶

Lyft — Beyond A/B Testing (2026-03-25) — canonical wiki instance of the why-user-split-is-insufficient-for-marketplace-LTE argument and of the user-split-verifies-Step-2-of-surrogacy use.

concepts/switch-back-experiment — time-based sibling.
concepts/region-split-experiment — region-based sibling, used when market mediation must be observed.
concepts/market-mediated-long-term-effects — the phenomenon that breaks user-split SUTVA in marketplaces.
concepts/surrogacy-causal-inference — the framework that uses user-split specifically for Step 2 verification.
concepts/augmented-inverse-probability-weighting — the Step 2 estimator that user-split verifies.