PATTERN Cited by 1 source
YAML-declared experiment config¶
Problem¶
A back-testing / simulation / benchmark harness has many parameters: time range to evaluate, candidate parameter space, search strategy (Bayesian / grid / listed), optimization metric, evaluation budget, run name. Hardcoding any of these into the code means experiments can't be reproduced without re-reading the code; ad-hoc CLI flags balloon as the framework grows.
Solution¶
Declare the experiment configuration in a YAML file that the framework reads at run time. The file captures:
- Time range (
date_interval) — the historical window being replayed. - Run identifier (
experiment_name) — a name that keys the results in the experiment store (e.g. MLflow). - Search strategy (
search_type) — Bayesian, grid, listed. - Search space — per-parameter typed range (
Real,Integer,Categorical). - Optimization metric (
minimize_metric) — which scalar the optimizer should drive down. - Evaluation budget (
max_evals) — how many candidates to run.
The YAML file is checked into version control alongside code,
making the experiment reproducible from (code commit, config
commit, historical data range).
Canonical instance — Yelp Back-Testing Engine¶
Yelp's Back-Testing Engine (2026-02-02) uses this shape. Example from the post:
date_interval:
- '2025-12-01'
- '2025-12-31'
experiment_name: 'algorithm_x_vs_status_quo'
searches:
- search_type: 'scikit-opt'
minimize_metric: 'average-cpl'
max_evals: 25
search_space:
allocation_algo: skopt.space.Categorical(['status-quo', 'algorithm_x'])
alpha: skopt.space.Real(-10, 10)
Note the direct embedding of systems/scikit-optimize search-
space primitives (skopt.space.Categorical, skopt.space.Real)
in the YAML — a lightweight form of configuration-as-code where
the YAML strings are evaluated against the skopt namespace.
Why YAML specifically¶
- Human-readable without needing an editor plugin.
- Version-controllable — diffs tell you what changed between experiments.
- Non-Python-centric — can be edited by applied scientists / analysts without knowing the framework's code internals.
- Enforces a surface — the framework can require specific
fields (
date_interval,experiment_name,searches) and validate them, unlike ad-hoc kwargs.
The Yelp post calls this out explicitly: "a human-readable format widely used for configuration."
When it's appropriate¶
- Experimentation frameworks where operators are not primarily code authors (applied scientists, analysts, data scientists).
- Scenarios where the same framework runs many experiments over time and a persistent config trail matters for audit.
- Cases where experiments are templated / copied — YAML files are easy to fork and diff.
When to avoid:
- Tiny, one-off experiments where the overhead of declaring YAML exceeds the write-code-directly overhead.
- Highly dynamic search spaces where parameter dependencies need logic YAML can't express cleanly (consider hierarchical config systems like Hydra instead).
Variations¶
- Embedded code strings (Yelp's
skopt.space.*syntax) — pragmatic but couples YAML to Python. - Separate search-space definitions per parameter type — more verbose but language-agnostic.
- Hierarchical / multi-file — split time range, search space, and metric into separate YAMLs for composition.
Relation to other patterns¶
- patterns/yaml-declared-feature-dag-topology-inferred — Yelp's sibling YAML-config pattern in a different domain (Spark ETL feature DAG).
- patterns/historical-replay-with-ml-outcome-predictor — the pattern this config drives.
Seen in¶
- sources/2026-02-02-yelp-back-testing-engine-ad-budget-allocation — canonical wiki instance for experiment / back-testing configs.