Skip to content

CONCEPT Cited by 1 source

Bayesian optimization over parameter space

Definition

Bayesian optimization is a class of sequential model-based optimization algorithms for expensive-to-evaluate objective functions. Given a search space over parameters and an objective that returns a scalar per evaluation, the algorithm:

  1. Proposes an initial candidate (usually random).
  2. Observes the objective value at that candidate.
  3. Fits a surrogate model over the (parameter, objective) observations seen so far — commonly a Gaussian process, also random forest / gradient-boosted surrogates.
  4. Uses an acquisition function (expected improvement, upper confidence bound, probability of improvement, …) to choose the next candidate, balancing exploitation of known-good regions against exploration of uncertain regions.
  5. Repeats until max_evals is reached.

Unlike grid search or random search, each candidate is chosen using information from prior evaluations — valuable when each evaluation is expensive.

Why it matters for system design

The method itself is statistical machinery; the architectural pattern around it is what recurs in systems work:

  • An iterative loop with expensive evaluation per step (simulation run, benchmark run, schema-advisor hypothetical index creation, feature-flag rollout window).
  • A small evaluation budget (minutes to hours per candidate, tens of candidates total).
  • A surrogate that learns from prior runs to avoid blindly enumerating the search space.

Yelp's instance

Yelp's Back-Testing Engine (2026-02-02) uses systems/scikit-optimize as the default optimizer. Example YAML configuration:

searches:
  - search_type: 'scikit-opt'
    minimize_metric: 'average-cpl'
    max_evals: 25
    search_space:
      allocation_algo: skopt.space.Categorical(['status-quo', 'algorithm_x'])
      alpha: skopt.space.Real(-10, 10)

Verbatim description of the loop: "The process begins with the optimizer suggesting an initial candidate, which is typically a random sample since no prior data exist. ... The Engine simulates this candidate and returns its performance metrics. In turn, the optimizer uses this feedback to select the next candidate, learning from previous results to propose combinations more likely to optimize the target metric."

Yelp's post names two alternative strategies the Engine supports:

  • Grid search — enumerate all combinations. Combinatorial explosion with more parameters (5 × 3 × 10 = 150 candidates from three parameters in the post's illustrative example). Does not learn from prior results.
  • Listed search — user enumerates candidates in YAML. Useful for targeted testing of specific configurations.

Yelp's key observation: "for all kinds of search except Scikit-Opt, the optimizer doesn't really act as an optimizer but just a wrapper that yields the next candidate to try." Bayesian optimization is the only one that actually learns.

When to prefer Bayesian over grid/random

  • Many continuous parameters (grid blows up combinatorially).
  • Evaluation is expensive enough that wasting evaluations on known-bad regions is unaffordable.
  • Parameter interactions matter (a surrogate captures these; random search doesn't).

When to prefer random/grid:

  • Small discrete spaces where you can afford to enumerate.
  • Evaluation is cheap and parallel — random search across many workers may beat Bayesian's sequential requirement.
  • You want fully reproducible coverage (Bayesian proposal ordering depends on randomness and prior evaluations).

Seen in

Last updated · 476 distilled / 1,218 read