CONCEPT Cited by 1 source

Poisson sampling for integer outcomes¶

Definition¶

A trick for converting an ML-predicted expected count into a realised integer count: treat the predicted value λ as the rate parameter of a Poisson distribution and sample count ~ Poisson(λ).

The Poisson distribution produces non-negative integers with mean = variance = λ, matching the "arrivals in a fixed window" shape of most production event-count phenomena (clicks, impressions, leads, requests, purchases).

Why it exists¶

ML regressors trained to predict counts via MSE or similar smooth losses output averages, not integers. In a simulation that claims to model realistic system behaviour, this is wrong in two ways:

Deterministic predictions — every run of the simulation with the same inputs produces the same output. Real systems have noise; simulations that don't are overconfident about which candidates are distinguishable.
Non-integer outcomes — downstream accounting logic (billing, inventory, metrics) expects integer counts. Fractional "half a click" breaks type contracts.

Poisson sampling solves both: it preserves the predictor's expected value in the long run while giving realistic single-run variance.

Yelp's instance¶

Yelp's Back-Testing Engine (2026-02-02) predicts daily expected impressions/clicks/leads per campaign via CatBoost regressors, then Poisson-samples the realised integer counts.

Verbatim: "Because our models output average expected values (not integers), we apply a Poisson distribution to simulate integer outcomes. This approach captures the randomness seen in live systems."

When it's appropriate¶

Poisson is a good fit when:

The underlying phenomenon is "events in a time window" (clicks, requests).
Events are approximately independent.
Mean ≈ variance is a reasonable first approximation.

It's less appropriate when:

Variance >> mean (overdispersion) — use Negative Binomial instead.
Events are bursty / correlated — e.g. group purchases, viral traffic spikes. Poisson understates variance.
Outcomes are counts with a known upper bound — use Binomial.

The trick is specifically that one line of code (sample from Poisson(λ) given the regressor output λ) buys you realistic stochasticity without retraining the regressor as a distributional predictor.

Seen in¶

sources/2026-02-02-yelp-back-testing-engine-ad-budget-allocation — canonical wiki instance.

concepts/counterfactual-outcome-prediction — Poisson sampling is an add-on to counterfactual prediction that restores stochasticity.
concepts/hybrid-backtesting-with-ml-counterfactual
systems/catboost
systems/yelp-back-testing-engine