SYSTEM Cited by 1 source
BayesQO¶
What it is¶
BayesQO is an offline query optimizer that applies Bayesian optimization to the join- order search problem. Given a query and a fixed iteration budget, BayesQO proposes candidate join orders using an acquisition function over a surrogate model trained on previously-observed (plan, runtime) pairs, aiming to find a better plan than the native optimizer's choice.
Originally built for PostgreSQL.
Architectural shape¶
1. Propose candidate plan (initial: random or optimizer's plan)
2. Execute plan → observe runtime
3. Update surrogate model with (plan, runtime)
4. Acquisition function selects next candidate balancing
exploitation (refine known-good) vs exploration (try
uncertain)
5. Repeat until budget exhausted
6. Return best plan observed
BayesQO shares the anytime optimizer shape with systems/databricks-join-order-agent — both converge monotonically as budget grows — but differs in the candidate- proposal mechanism:
| Axis | BayesQO | Databricks LLM agent |
|---|---|---|
| Proposal mechanism | Gaussian-process / tree-based surrogate + acquisition function | Frontier LLM with grammar-constrained structured output |
| Domain knowledge | None (learned from rollouts only) | Learned priors from training corpus |
| Inspection of intermediate results | Scalar runtime only | Runtime + per-subplan sizes |
| Target engine | PostgreSQL | Databricks |
Why it appears on the wiki¶
BayesQO is the prior-art baseline that Databricks' LLM-agent experiment compares itself against (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization). The post's framing:
"This outperforms using perfect cardinality estimates (intractable in practice), smaller models, and the recent BayesQO offline optimizer (although BayesQO was designed for PostgreSQL, not Databricks)."
The parenthetical is important: BayesQO wasn't tuned for the Databricks engine, so the comparison is asymmetric. The result frames Bayesian-optimization-over-plans as a weaker baseline than LLM-directed search for this class of problem, at least on the Databricks execution engine and the JOB benchmark.
Reference¶
Project link from the source: https://rm.cab/bayesqo
Contrast with LLM-agent approach¶
The core architectural disagreement: what does "propose the next candidate" entail?
- BayesQO: a scalar-objective statistical model with an acquisition function — formally principled, domain-agnostic, no transfer of knowledge from prior databases or plan literature.
- LLM agent: a pattern-matcher against its training corpus — informally principled (correctness guaranteed only by the grammar and execution timeout), domain-aware, implicitly transfers knowledge.
The Databricks result is evidence that, at least for join-ordering on a modern query engine, the LLM's domain knowledge advantage outweighs the BO's statistical rigour. This generalises the broader pattern: where an LLM has pattern- matching coverage of a domain's solution space, agent search often beats statistical search.
Seen in¶
- sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization — Canonical wiki entry. Named as the closest prior-art baseline for offline query plan optimizers. Beaten by the Databricks LLM agent on JOB-10× (with the Postgres-vs- Databricks caveat).
Related¶
- concepts/bayesian-optimization-over-parameter-space — the algorithmic foundation
- concepts/join-order-optimization — the problem domain
- concepts/query-planner — the broader optimizer that BayesQO complements offline
- systems/databricks-join-order-agent — the LLM-agent alternative that beats BayesQO on this benchmark
- systems/join-order-benchmark-job — the benchmark both approaches share