SYSTEM Cited by 1 source

BayesQO¶

What it is¶

BayesQO is an offline query optimizer that applies Bayesian optimization to the join- order search problem. Given a query and a fixed iteration budget, BayesQO proposes candidate join orders using an acquisition function over a surrogate model trained on previously-observed (plan, runtime) pairs, aiming to find a better plan than the native optimizer's choice.

Originally built for PostgreSQL.

Architectural shape¶

1. Propose candidate plan (initial: random or optimizer's plan)
2. Execute plan → observe runtime
3. Update surrogate model with (plan, runtime)
4. Acquisition function selects next candidate balancing
   exploitation (refine known-good) vs exploration (try
   uncertain)
5. Repeat until budget exhausted
6. Return best plan observed

BayesQO shares the anytime optimizer shape with systems/databricks-join-order-agent — both converge monotonically as budget grows — but differs in the candidate- proposal mechanism:

Axis	BayesQO	Databricks LLM agent
Proposal mechanism	Gaussian-process / tree-based surrogate + acquisition function	Frontier LLM with grammar-constrained structured output
Domain knowledge	None (learned from rollouts only)	Learned priors from training corpus
Inspection of intermediate results	Scalar runtime only	Runtime + per-subplan sizes
Target engine	PostgreSQL	Databricks

Why it appears on the wiki¶

BayesQO is the prior-art baseline that Databricks' LLM-agent experiment compares itself against (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization). The post's framing:

"This outperforms using perfect cardinality estimates (intractable in practice), smaller models, and the recent BayesQO offline optimizer (although BayesQO was designed for PostgreSQL, not Databricks)."

The parenthetical is important: BayesQO wasn't tuned for the Databricks engine, so the comparison is asymmetric. The result frames Bayesian-optimization-over-plans as a weaker baseline than LLM-directed search for this class of problem, at least on the Databricks execution engine and the JOB benchmark.

Reference¶

Project link from the source: https://rm.cab/bayesqo

Contrast with LLM-agent approach¶

The core architectural disagreement: what does "propose the next candidate" entail?

BayesQO: a scalar-objective statistical model with an acquisition function — formally principled, domain-agnostic, no transfer of knowledge from prior databases or plan literature.
LLM agent: a pattern-matcher against its training corpus — informally principled (correctness guaranteed only by the grammar and execution timeout), domain-aware, implicitly transfers knowledge.

The Databricks result is evidence that, at least for join-ordering on a modern query engine, the LLM's domain knowledge advantage outweighs the BO's statistical rigour. This generalises the broader pattern: where an LLM has pattern- matching coverage of a domain's solution space, agent search often beats statistical search.

Seen in¶

sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization — Canonical wiki entry. Named as the closest prior-art baseline for offline query plan optimizers. Beaten by the Databricks LLM agent on JOB-10× (with the Postgres-vs- Databricks caveat).

concepts/bayesian-optimization-over-parameter-space — the algorithmic foundation
concepts/join-order-optimization — the problem domain
concepts/query-planner — the broader optimizer that BayesQO complements offline
systems/databricks-join-order-agent — the LLM-agent alternative that beats BayesQO on this benchmark
systems/join-order-benchmark-job — the benchmark both approaches share