Skip to content

SYSTEM Cited by 1 source

CatBoost

Definition

CatBoost is a gradient-boosted decision-tree library originated at Yandex. Distinguishing design choices (from the original release): native support for categorical features without one-hot encoding via ordered target statistics, and ordered boosting to reduce prediction-shift bias during training.

The library is widely used for classification and regression on tabular data with mixed feature types.

How Yelp uses it

Yelp's Back-Testing Engine (2026-02-02) uses CatBoost regressors as counterfactual-outcome predictors in its ad-budget simulation. Given (daily budget, campaign features), the models predict expected impressions, clicks, and leads.

Verbatim from the post: "we leverage ML models (specifically, CatBoost) trained to predict expected impressions, clicks, and leads based on daily budget and campaign features."

Why CatBoost (not a simpler model):

  • Non-parametric — doesn't assume a fixed functional form. Yelp names this explicitly: "Using a non-parametric ML approach, instead of making simplistic assumptions (e.g. constant cost per click), allows us to accurately capture complex effects such as diminishing returns on budget."
  • Handles campaign features natively — ad campaigns have many categorical attributes (category, ad-type, etc.) where CatBoost's built-in categorical handling avoids feature- engineering boilerplate.

Outputs are averages, not integers

The CatBoost models output expected values (e.g. expected clicks = 12.3). The Back-Testing Engine then samples the realised integer count from a Poisson distribution parameterised by that expected value — see concepts/poisson-sampling-for-integer-outcomes.

Model-level guardrails

Yelp names one monitoring practice: "we monitor these models to prevent overfitting, checking that performance is consistent between training and hold-out datasets." No MAE / MAPE / R² numbers are disclosed.

The same model is used for all candidates in a simulation, so cross-candidate comparisons are fair even if the absolute predicted values are biased.

Why it matters for system design

CatBoost appears in system-design writing primarily as a "good enough tabular-regression solver" — the architectural interest lies in how the predictor is wired in (counterfactual outcome predictor, Poisson-sampled, shared across candidates), not in the predictor itself.

Canonical instance of concepts/counterfactual-outcome-prediction.

Seen in

Last updated · 476 distilled / 1,218 read