Skip to content

PATTERN Cited by 1 source

YAML-declared experiment config

Problem

A back-testing / simulation / benchmark harness has many parameters: time range to evaluate, candidate parameter space, search strategy (Bayesian / grid / listed), optimization metric, evaluation budget, run name. Hardcoding any of these into the code means experiments can't be reproduced without re-reading the code; ad-hoc CLI flags balloon as the framework grows.

Solution

Declare the experiment configuration in a YAML file that the framework reads at run time. The file captures:

  • Time range (date_interval) — the historical window being replayed.
  • Run identifier (experiment_name) — a name that keys the results in the experiment store (e.g. MLflow).
  • Search strategy (search_type) — Bayesian, grid, listed.
  • Search space — per-parameter typed range (Real, Integer, Categorical).
  • Optimization metric (minimize_metric) — which scalar the optimizer should drive down.
  • Evaluation budget (max_evals) — how many candidates to run.

The YAML file is checked into version control alongside code, making the experiment reproducible from (code commit, config commit, historical data range).

Canonical instance — Yelp Back-Testing Engine

Yelp's Back-Testing Engine (2026-02-02) uses this shape. Example from the post:

date_interval:
  - '2025-12-01'
  - '2025-12-31'

experiment_name: 'algorithm_x_vs_status_quo'

searches:
  - search_type: 'scikit-opt'
    minimize_metric: 'average-cpl'
    max_evals: 25
    search_space:
      allocation_algo: skopt.space.Categorical(['status-quo', 'algorithm_x'])
      alpha: skopt.space.Real(-10, 10)

Note the direct embedding of systems/scikit-optimize search- space primitives (skopt.space.Categorical, skopt.space.Real) in the YAML — a lightweight form of configuration-as-code where the YAML strings are evaluated against the skopt namespace.

Why YAML specifically

  • Human-readable without needing an editor plugin.
  • Version-controllable — diffs tell you what changed between experiments.
  • Non-Python-centric — can be edited by applied scientists / analysts without knowing the framework's code internals.
  • Enforces a surface — the framework can require specific fields (date_interval, experiment_name, searches) and validate them, unlike ad-hoc kwargs.

The Yelp post calls this out explicitly: "a human-readable format widely used for configuration."

When it's appropriate

  • Experimentation frameworks where operators are not primarily code authors (applied scientists, analysts, data scientists).
  • Scenarios where the same framework runs many experiments over time and a persistent config trail matters for audit.
  • Cases where experiments are templated / copied — YAML files are easy to fork and diff.

When to avoid:

  • Tiny, one-off experiments where the overhead of declaring YAML exceeds the write-code-directly overhead.
  • Highly dynamic search spaces where parameter dependencies need logic YAML can't express cleanly (consider hierarchical config systems like Hydra instead).

Variations

  • Embedded code strings (Yelp's skopt.space.* syntax) — pragmatic but couples YAML to Python.
  • Separate search-space definitions per parameter type — more verbose but language-agnostic.
  • Hierarchical / multi-file — split time range, search space, and metric into separate YAMLs for composition.

Relation to other patterns

Seen in

Last updated · 476 distilled / 1,218 read