Skip to content

CONCEPT Cited by 1 source

User-split experiment

Definition

A user-split experiment is classical A/B testing at the user level: randomise users into treatment and control groups, apply different variants of a feature / policy / prompt to each group, and compare outcomes. It is the default online-experiment shape across tech companies and the one most "A/B test" tutorials describe.

User-split is maximally powerful per randomisation unit — a large internet platform can randomise millions of users per experiment, producing tight confidence intervals on individual- level outcomes.

When user-split works

User-split recovers unbiased treatment effects when the SUTVA (Stable Unit Treatment Value Assumption) holds: a user's outcome depends only on their own treatment assignment, not on anyone else's. SUTVA holds, approximately, in settings where users don't interact through shared state — e.g. search-result relevance, single-player UI layout, spam filter accuracy.

When user-split fails: marketplace interference

SUTVA fails in multi-sided marketplaces (ridesharing, delivery, lodging, ads) and other systems with interference — users in one group affect outcomes in the other group through shared state. Example: if Lyft increases driver incentive spend for a treatment group of drivers, those drivers drive more → market-wide supply increases → wait times and surge drop for all riders → control drivers see shorter queues and lower per-ride earnings. The control group is not actually counterfactual for the treatment; they experience a different market because the treatment changed the market.

The consequence: user-split estimators systematically underestimate (or bias, if counterbalancing channels exist) the full effect of an intervention in a marketplace, because they silently differenced out the market-mediated effects the intervention also caused.

Where user-split is still useful in marketplace methodology

User-split remains useful for individual-level mediator-to-outcome estimation, which is exactly how Lyft uses it in Step 2 of its surrogacy framework. The argument:

  • Step 2 asks: given an individual user faces a particular level of short-term negative experience (surge, wait, cancellation), how does their future behaviour change?
  • At the individual level, within a randomised population sharing the same market, market-mediated feedback is small: treatment and control users see the same market because they're mixed together.
  • The difference between their future outcomes therefore reflects the individual-level effect of the mediator on behaviour, which is the Step 2 quantity.

Lyft: "We run user‑split experiments that perturb negative experiences and compare the model's predicted changes in future outcomes to the experimental lifts, checking calibration (predicted vs. observed) for validation."

Where user-split must not be used in marketplace methodology

User-split cannot validate the end-to-end, market-mediated long-term effect. Lyft uses region-split experiments for that — they keep the market itself as the randomisation unit so the market response is observable.

Comparison

Experiment Randomisation unit Observes market mediation Power Long-term outcomes
User-split user ❌ no ✅ high ✅ (long windows)
Switch-back time slot ✅ yes (within slot) medium ❌ no (slots too short)
Region-split region ✅ yes (whole market) ⚠️ low ✅ yes

Seen in

  • Lyft — Beyond A/B Testing (2026-03-25) — canonical wiki instance of the why-user-split-is-insufficient-for-marketplace-LTE argument and of the user-split-verifies-Step-2-of-surrogacy use.
Last updated · 319 distilled / 1,201 read