Skip to content

PATTERN Cited by 1 source

LLM-plus-planner validation

Problem

LLM-generated database changes (index suggestions, query rewrites, schema mutations) are plausibly correct but sometimes wrong. Shipping unvalidated LLM output directly to production is unsafe. But asking humans to validate every suggestion doesn't scale — and the humans often make the same judgment the database engine itself could make.

The specific failure mode is confident-but-wrong. From PlanetScale's framing (sources/2026-04-21-planetscale-ai-powered-postgres-index-suggestions):

As anyone who's used LLM tools for software development can tell you, it's crucially important to validate LLM generated solutions before shipping them to production. Before we suggest that a customer make changes to their production database, we need to make sure that our suggestions will actually have the desired effect.

Solution

Pair the LLM with the database engine's own planner as a validation oracle. The LLM proposes changes; the planner decides whether each proposed change is worth surfacing.

The canonical shape for index suggestions:

  1. LLM proposes CREATE INDEX statements, each paired with a citation of which queries the index is designed to help.
  2. Syntactic validation — parse the CREATE INDEX statement to confirm it's well-formed.
  3. Planner validation — register each candidate as a hypothetical index (via HypoPG on Postgres), run EXPLAIN on each cited query with and without the hypothetical index, compare the planner's cost estimate.
  4. Keep only suggestions the planner agrees with. Drop anything that doesn't "result in a substantial improvement" on at least one cited query.

The general shape extends beyond index suggestions — for any LLM-generated database change, pair the LLM with the engine's native validation primitive:

LLM output Validation oracle
CREATE INDEX HypoPG + EXPLAIN cost delta
Query rewrite EXPLAIN on both + result-set equivalence check
Migration script Dry-run on a schema-branched clone + schema diff
Partition scheme Planner cost on representative queries

Why this works

  • The LLM brings creativity; the planner brings ground truth. LLMs are good at finding candidate indexes from schema + workload context — comparable to human DBAs — but aren't trustworthy by themselves. The planner's cost model, even as an approximation, is anchored in real table statistics and actual query-execution logic.
  • Validation is cheap. Hypothetical evaluation costs microseconds; the real CREATE INDEX would cost hours. Running validation on every candidate is tractable.
  • The validator never ships a false positive. Even if the LLM hallucinates an index, the planner will find no cost reduction and drop the suggestion.
  • False negatives are acceptable. If the planner's estimate is conservative and a genuinely useful index gets discarded, the customer's database isn't harmed — the worst case is a missed optimization opportunity, which matches the human-DBA baseline.

Forces

  • The validation oracle must be cheaper than the change. Pattern breaks if checking each candidate costs as much as just building the index. Hypothetical-index evaluation satisfies this; hypothetical table rewrites don't exist, so the pattern doesn't generalise trivially to all migrations.
  • The LLM must cite what each change is for. The validator needs to know which queries to re-cost. An CREATE INDEX without "here are the queries it helps" is untestable against the planner.
  • Planner estimates ≠ measurements. The pattern rejects clearly-bad candidates but doesn't guarantee the accepted ones will produce the exact speed-up shown. That's acceptable for a suggestion (the customer can accept or decline), not acceptable for automatic application without a post-apply A/B check.
  • patterns/workload-aware-llm-prompting — the input counterpart. Workload-level filtering before the LLM sees anything; this pattern is the output-side validation after the LLM has generated candidates.
  • patterns/explain-before-execute-validation — sibling Pinterest pattern for validating LLM-generated SQL queries against the planner before executing them. Same structural insight applied at query-execution time rather than at index-design time.

Instances

  • PlanetScale Insights AI-powered index suggestions (2026-04) — canonical wiki instance. LLM generates CREATE INDEX statements from a workload-pre-filtered prompt; HypoPG + EXPLAIN validates each one against the planner's cost estimate; discards any candidate that doesn't "improve at least one candidate query." See sources/2026-04-21-planetscale-ai-powered-postgres-index-suggestions.

Seen in

Last updated · 319 distilled / 1,201 read