Skip to content

SYSTEM Cited by 1 source

Join Order Benchmark (JOB)

What it is

Join Order Benchmark (JOB) is a canonical academic benchmark for join-order optimization consisting of 113 queries over the IMDb dataset (Source: Leis et al., PVLDB 2016, cited in sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization). Queries are explicitly designed to be difficult for query optimizers — they target cardinality-estimation weak spots via skewed distributions, correlated predicates, and (famously) LIKE predicates on free-text notes columns.

The IMDb dataset

  • Base size: ~2 GB (small by modern data-warehouse standards).
  • Schema: Actor, Company, Movie, Stars (actor × movie), Produces (company × movie), plus metadata tables.
  • Characteristics: real data with real skew — some actors have many movies, some none; some companies dominate; notes fields contain free-text with patterns like "VHS", "DVD", years, etc.

Databricks' scaling choice

"Since the dataset used by JOB, the IMDb dataset, is only around 2GB (and therefore Databricks could process even poor join orderings fairly quickly), we scaled up the dataset by duplicating each row 10 times."

The 10× row-duplication is a deliberate methodological choice: the original JOB dataset is too small for Databricks' execution engine to distinguish optimizer quality — even bad plans finish fast. Scaling amplifies the gap between good and bad plans, making optimizer improvements visible.

The post flags this as "a prior-work methodology" (cited C in the post). It has known limitations: duplicating rows changes cardinalities (every distinct value becomes 10× more frequent) without changing distinct counts or correlations, which can favour some plan shapes artificially. Treat JOB-10× results as a relative comparison metric, not absolute speedups.

Query 5b: canonical failure case

From Databricks' post:

"One of the largest improvements our agent found was in query 5b, a simple 5-way join which looks for American production companies that released a post-2010 movie on VHS with a note referencing 1994. … This is because the query uses LIKE predicates to identify VHS releases, which are exceptionally difficult for cardinality estimators."

Query 5b has become a go-to illustration of concepts/like-predicate-cardinality-estimation-failure — string-pattern selectivity is a structural blind spot, and JOB surfaces it cleanly.

Why the benchmark matters

  • Real schema, real data, real skew. Most benchmarks (TPC-H, TPC-DS) use synthetic distributions; JOB uses IMDb.
  • Optimizer-adversarial by design. Queries are selected to be hard, not representative.
  • Cardinality-estimation focused. Specifically exercises the weakest leg of traditional query optimizers.
  • Standard for academic comparison. Paper authors (Leis, Neumann) are canonical in the optimization literature.

Production usage (2026)

Databricks' join-order agent is the most recent wiki entry using JOB, with 15 rollouts × 113 queries producing the reported 1.288× geomean / 41% P90 drop results. Prior work (including systems/bayesqo) also uses JOB as the standard comparison benchmark for offline query optimizers.

Source paper

How Good Are Query Optimizers, Really? Leis et al., PVLDB 2016. https://www.vldb.org/pvldb/vol9/p204-leis.pdf

SQL source: https://github.com/RyanMarcus/imdb_pg_dataset

Seen in

Last updated · 510 distilled / 1,221 read