PATTERN Cited by 1 source

YAML-declared experiment config¶

Problem¶

A back-testing / simulation / benchmark harness has many parameters: time range to evaluate, candidate parameter space, search strategy (Bayesian / grid / listed), optimization metric, evaluation budget, run name. Hardcoding any of these into the code means experiments can't be reproduced without re-reading the code; ad-hoc CLI flags balloon as the framework grows.

Solution¶

Declare the experiment configuration in a YAML file that the framework reads at run time. The file captures:

Time range (date_interval) — the historical window being replayed.
Run identifier (experiment_name) — a name that keys the results in the experiment store (e.g. MLflow).
Search strategy (search_type) — Bayesian, grid, listed.
Search space — per-parameter typed range (Real, Integer, Categorical).
Optimization metric (minimize_metric) — which scalar the optimizer should drive down.
Evaluation budget (max_evals) — how many candidates to run.

The YAML file is checked into version control alongside code, making the experiment reproducible from (code commit, config commit, historical data range).

Canonical instance — Yelp Back-Testing Engine¶

Yelp's Back-Testing Engine (2026-02-02) uses this shape. Example from the post:

date_interval:
  - '2025-12-01'
  - '2025-12-31'

experiment_name: 'algorithm_x_vs_status_quo'

searches:
  - search_type: 'scikit-opt'
    minimize_metric: 'average-cpl'
    max_evals: 25
    search_space:
      allocation_algo: skopt.space.Categorical(['status-quo', 'algorithm_x'])
      alpha: skopt.space.Real(-10, 10)

Note the direct embedding of systems/scikit-optimize search- space primitives (skopt.space.Categorical, skopt.space.Real) in the YAML — a lightweight form of configuration-as-code where the YAML strings are evaluated against the skopt namespace.

Why YAML specifically¶

Human-readable without needing an editor plugin.
Version-controllable — diffs tell you what changed between experiments.
Non-Python-centric — can be edited by applied scientists / analysts without knowing the framework's code internals.
Enforces a surface — the framework can require specific fields (date_interval, experiment_name, searches) and validate them, unlike ad-hoc kwargs.

The Yelp post calls this out explicitly: "a human-readable format widely used for configuration."

When it's appropriate¶

Experimentation frameworks where operators are not primarily code authors (applied scientists, analysts, data scientists).
Scenarios where the same framework runs many experiments over time and a persistent config trail matters for audit.
Cases where experiments are templated / copied — YAML files are easy to fork and diff.

When to avoid:

Tiny, one-off experiments where the overhead of declaring YAML exceeds the write-code-directly overhead.
Highly dynamic search spaces where parameter dependencies need logic YAML can't express cleanly (consider hierarchical config systems like Hydra instead).

Variations¶

Embedded code strings (Yelp's skopt.space.* syntax) — pragmatic but couples YAML to Python.
Separate search-space definitions per parameter type — more verbose but language-agnostic.
Hierarchical / multi-file — split time range, search space, and metric into separate YAMLs for composition.

Relation to other patterns¶

patterns/yaml-declared-feature-dag-topology-inferred — Yelp's sibling YAML-config pattern in a different domain (Spark ETL feature DAG).
patterns/historical-replay-with-ml-outcome-predictor — the pattern this config drives.

Seen in¶

sources/2026-02-02-yelp-back-testing-engine-ad-budget-allocation — canonical wiki instance for experiment / back-testing configs.