CONCEPT Cited by 1 source
Poisson sampling for integer outcomes¶
Definition¶
A trick for converting an ML-predicted expected count into
a realised integer count: treat the predicted value λ as
the rate parameter of a Poisson distribution and sample
count ~ Poisson(λ).
The Poisson distribution produces non-negative integers with mean = variance = λ, matching the "arrivals in a fixed window" shape of most production event-count phenomena (clicks, impressions, leads, requests, purchases).
Why it exists¶
ML regressors trained to predict counts via MSE or similar smooth losses output averages, not integers. In a simulation that claims to model realistic system behaviour, this is wrong in two ways:
- Deterministic predictions — every run of the simulation with the same inputs produces the same output. Real systems have noise; simulations that don't are overconfident about which candidates are distinguishable.
- Non-integer outcomes — downstream accounting logic (billing, inventory, metrics) expects integer counts. Fractional "half a click" breaks type contracts.
Poisson sampling solves both: it preserves the predictor's expected value in the long run while giving realistic single-run variance.
Yelp's instance¶
Yelp's Back-Testing Engine (2026-02-02) predicts daily expected impressions/clicks/leads per campaign via CatBoost regressors, then Poisson-samples the realised integer counts.
Verbatim: "Because our models output average expected values (not integers), we apply a Poisson distribution to simulate integer outcomes. This approach captures the randomness seen in live systems."
When it's appropriate¶
Poisson is a good fit when:
- The underlying phenomenon is "events in a time window" (clicks, requests).
- Events are approximately independent.
- Mean ≈ variance is a reasonable first approximation.
It's less appropriate when:
- Variance >> mean (overdispersion) — use Negative Binomial instead.
- Events are bursty / correlated — e.g. group purchases, viral traffic spikes. Poisson understates variance.
- Outcomes are counts with a known upper bound — use Binomial.
The trick is specifically that one line of code (sample from
Poisson(λ) given the regressor output λ) buys you realistic
stochasticity without retraining the regressor as a distributional
predictor.
Seen in¶
- sources/2026-02-02-yelp-back-testing-engine-ad-budget-allocation — canonical wiki instance.
Related¶
- concepts/counterfactual-outcome-prediction — Poisson sampling is an add-on to counterfactual prediction that restores stochasticity.
- concepts/hybrid-backtesting-with-ml-counterfactual
- systems/catboost
- systems/yelp-back-testing-engine