SYSTEM Cited by 1 source
CatBoost¶
Definition¶
CatBoost is a gradient-boosted decision-tree library originated at Yandex. Distinguishing design choices (from the original release): native support for categorical features without one-hot encoding via ordered target statistics, and ordered boosting to reduce prediction-shift bias during training.
The library is widely used for classification and regression on tabular data with mixed feature types.
How Yelp uses it¶
Yelp's Back-Testing Engine (2026-02-02) uses CatBoost regressors as counterfactual-outcome predictors in its ad-budget simulation. Given (daily budget, campaign features), the models predict expected impressions, clicks, and leads.
Verbatim from the post: "we leverage ML models (specifically, CatBoost) trained to predict expected impressions, clicks, and leads based on daily budget and campaign features."
Why CatBoost (not a simpler model):
- Non-parametric — doesn't assume a fixed functional form. Yelp names this explicitly: "Using a non-parametric ML approach, instead of making simplistic assumptions (e.g. constant cost per click), allows us to accurately capture complex effects such as diminishing returns on budget."
- Handles campaign features natively — ad campaigns have many categorical attributes (category, ad-type, etc.) where CatBoost's built-in categorical handling avoids feature- engineering boilerplate.
Outputs are averages, not integers¶
The CatBoost models output expected values (e.g. expected clicks = 12.3). The Back-Testing Engine then samples the realised integer count from a Poisson distribution parameterised by that expected value — see concepts/poisson-sampling-for-integer-outcomes.
Model-level guardrails¶
Yelp names one monitoring practice: "we monitor these models to prevent overfitting, checking that performance is consistent between training and hold-out datasets." No MAE / MAPE / R² numbers are disclosed.
The same model is used for all candidates in a simulation, so cross-candidate comparisons are fair even if the absolute predicted values are biased.
Why it matters for system design¶
CatBoost appears in system-design writing primarily as a "good enough tabular-regression solver" — the architectural interest lies in how the predictor is wired in (counterfactual outcome predictor, Poisson-sampled, shared across candidates), not in the predictor itself.
Canonical instance of concepts/counterfactual-outcome-prediction.
Seen in¶
- sources/2026-02-02-yelp-back-testing-engine-ad-budget-allocation — Yelp's Back-Testing Engine outcome predictor.
Related¶
- concepts/counterfactual-outcome-prediction
- concepts/poisson-sampling-for-integer-outcomes
- systems/scikit-optimize — paired with CatBoost in Yelp's simulation (CatBoost predicts outcomes; Scikit-Optimize picks candidates).