CONCEPT Cited by 1 source

Counterfactual outcome prediction¶

Definition¶

Counterfactual outcome prediction is the use of an ML model to estimate what an outcome would have been under a treatment that wasn't actually applied. The model is trained on (features, treatment, outcome) triples from real data; at inference time, it predicts outcome | features, new_treatment.

The prediction is counterfactual because the (features, new_treatment) pair may never have occurred in the training data — the model extrapolates.

Why it exists¶

Simulation systems that want to compare hypothetical code paths against historical reality need to fill in outcomes that never happened. Two alternatives are usually worse:

Pretend the outcome is unchanged — assumes the new treatment has no downstream effect, which defeats the purpose of the simulation.
Use a simple analytic model — e.g. constant cost-per-click. Misses non-linearities the data exhibits.

A trained regressor interpolates real data, capturing non-linear effects like diminishing returns, variable cost-per-action at different volumes, or interaction between campaign-type categorical features and continuous budget.

Yelp's instance — ad-outcome regression¶

Yelp's Back-Testing Engine (2026-02-02) uses CatBoost regressors to predict daily impressions, clicks, and leads from daily budget + campaign features. The models are shared across all candidates in a simulation so cross-candidate deltas are attributable to the algorithm under test, not to noise in the predictor.

Verbatim from the post on why non-parametric models matter: "Using a non-parametric ML approach, instead of making simplistic assumptions (e.g. constant cost per click), allows us to accurately capture complex effects such as diminishing returns on budget, resulting in simulations that more closely reflect real-world behavior."

Coupling to randomness¶

Real systems produce integer outcomes with randomness. A regressor outputs a smooth expected value. Yelp's post addresses this by sampling the realised count from a Poisson distribution — see concepts/poisson-sampling-for-integer-outcomes. Without this step, every simulation run would produce identical deterministic outcomes for a given candidate, which understates real-world noise.

Limits¶

Out-of-distribution treatments — if the new algorithm proposes budgets far outside the historical range, the predictor extrapolates, which is less reliable.
Confounded features — if a campaign feature was a proxy for historical budget allocation (e.g. campaigns with high CPC were also the campaigns with high budgets), the model may learn that correlation and mispredict under a counterfactual that breaks it.
Concept drift — user/market behaviour shifts over time; a model trained on old data may mispredict on current conditions.

Yelp names the overall caveat without drilling into these: "back-testing relies on historical data and model assumptions, which may not capture major shifts in user, market, or partner behavior."

Seen in¶

sources/2026-02-02-yelp-back-testing-engine-ad-budget-allocation — canonical wiki instance.

concepts/hybrid-backtesting-with-ml-counterfactual — the parent methodology.
concepts/poisson-sampling-for-integer-outcomes — the add-on that restores stochasticity to the predictor output.
systems/catboost — the regressor Yelp uses.
systems/yelp-back-testing-engine