PlanetScale — AI-Powered Postgres index suggestions¶
Summary¶
PlanetScale's Rafer Hazen announces AI-powered Postgres
index suggestions shipping inside
PlanetScale Insights. The
product pairs an LLM
that proposes CREATE INDEX statements with a
HypoPG-based
validation step that runs EXPLAIN on every candidate
query with and without the proposed index — suggestions that
don't measurably reduce planner-estimated cost on at least
one candidate are discarded before they ever reach a customer.
The post is the canonical wiki instance of
LLM-plus-planner
validation (LLM generates candidates, Postgres's own
planner decides whether each candidate is worth recommending)
and workload-aware
LLM prompting (Insights' query-performance telemetry is
mined to select the subset of query patterns likely to
benefit from an index before they're shown to the model, so
the prompt is shorter and the LLM isn't asked to decide
whether anything needs changing).
Key takeaways¶
-
The launch — AI-powered index suggestions is a new feature in Insights for Postgres databases that "monitors your database workload and periodically suggests new indexes to improve query execution speed and overall database performance" (Source: sources/2026-04-21-planetscale-ai-powered-postgres-index-suggestions).
-
Why LLMs alone aren't enough — the two identified failure modes.
- "LLMs are great at finding an answer but can't always be trusted to decide if maybe nothing needs to change at all." PlanetScale fixes this by doing the does-anything-need-changing? decision themselves from Insights telemetry, before the LLM is invoked.
-
"Sometimes LLMs produce inaccurate results … it's crucially important to validate LLM generated solutions before shipping them to production." PlanetScale fixes this by running every proposed
CREATE INDEXthrough Postgres's own query planner via HypoPG. -
Workload-level pre-filter reduces the candidate set. Before the LLM sees anything, Insights filters query patterns to those most likely to benefit from an index:
- Rows-read to rows-returned ratio — "If a query is reading a much larger set of rows than it is returning, it's an indication that an index (or a more selective index) could improve response times."
- Minimum workload share — "the query pattern is responsible for at least 0.1% of the aggregated runtime of all queries."
- Minimum frequency — query pattern must have run a "minimum number of times."
Rationale: "Since indexes incur storage, memory and write overhead, we want to avoid suggesting indexes for ad-hoc or infrequently run queries." This is a canonical instance of secondary-index write-cost awareness — the feature refuses to recommend indexes whose write-amplification tax won't be earned back by read-side wins.
-
Schema filtering keeps the prompt focused. Once the candidate query set is picked, "we filter the schema down to the tables referenced by the query. This keeps the prompt smaller and more focused." (Source: sources/2026-04-21-planetscale-ai-powered-postgres-index-suggestions) Prompt content = workload-filtered queries + minimal relevant schema + an instruction to cite which queries each proposed index is designed to improve (the citation step is load-bearing for validation).
-
Validation happens in two phases.
- Syntactic validation — parse the generated
CREATE INDEXstatements to make sure they "are syntactically valid and of the correct form." -
Planner validation via HypoPG — "HypoPG lets us create hypothetical indexes that do not actually exist (and therefore have no overhead) but which the planner can use in the context of
EXPLAINcommands. This allows us to find an estimated cost using the actual Postgres planner." RunEXPLAINon each candidate query with and without the hypothetical index; discard any suggestion "that doesn't improve at least one candidate query." -
Only substantial improvements are surfaced. The gating condition is "predicted cost improvement is substantial enough to justify recommending a new index." The UI shows the customer the queries each surviving suggestion is designed to improve, "including the estimated reduction in query cost."
-
Insights' unique position is the moat. "Our goal with Postgres index suggestions is not just to save you the trouble of asking LLMs for suggestions on your own, but to use the unique data and capabilities available in Insights to produce the best overall results." The Insights extension already has the per-query-pattern runtime share + execution-count
- rows-read/rows-returned history; a plain LLM-chat tool has none of those signals. The canonical framing: workload telemetry is the input that makes the LLM useful, and the planner is the output filter that makes the LLM safe.
Systems / concepts / patterns extracted¶
-
Systems
- systems/planetscale-insights — now hosts AI-powered index suggestions as a new surface alongside query performance telemetry + Traffic Control.
- systems/postgresql — target engine; the planner's
EXPLAINcost estimate is the validation oracle. - systems/hypopg — new wiki page; Postgres extension that lets the planner reason about hypothetical indexes without creating them.
-
Concepts
- concepts/hypothetical-index-evaluation — new wiki
vocabulary for "use
EXPLAIN+ a hypothetical-index extension to cost a proposed index without paying any write-side price." - concepts/llm-generated-database-changes — new wiki vocabulary for the class of workflows where an LLM proposes schema/index mutations that must be validated before they're applied.
- concepts/index-candidate-filtering — new wiki vocabulary for the workload-profile-based pre-filter (rows-read:rows-returned, % runtime share, minimum frequency) that decides which query patterns are even candidates for indexing.
- concepts/secondary-index — extended: this post is a canonical production instance of refusing to recommend indexes whose write-overhead isn't justified by read-side wins.
- concepts/hypothetical-index-evaluation — new wiki
vocabulary for "use
-
Patterns
- patterns/llm-plus-planner-validation — new canonical wiki pattern: LLM proposes, planner (via hypothetical-index cost estimate) filters. Caller ships only the suggestions the planner agrees with.
- patterns/workload-aware-llm-prompting — new canonical wiki pattern: don't ask the LLM whether anything needs changing; use your own workload telemetry to decide that, then hand the LLM a focused subset.
Operational numbers¶
- Minimum query-pattern runtime share to be a candidate = 0.1% of aggregated runtime across all queries. ("query pattern is responsible for at least 0.1% of the aggregated runtime of all queries")
- Minimum query-pattern frequency to be a candidate = "minimum number of times" (threshold not disclosed).
- Ratio signal = rows-read ÷ rows-returned — higher ratios are stronger candidate signals. ("high ratio of rows read to rows returned … indication that an index (or a more selective index) could improve response times")
- Validation discard rule = suggestion is dropped unless it improves the planner-estimated cost of at least one candidate query by a substantial margin.
- Two validation phases:
- Parse
CREATE INDEXstatement for syntactic validity. EXPLAINeach candidate query with and without the hypothetical index (via HypoPG); compare planner cost.- Integration point = systems/planetscale-insights extension (same surface as Traffic Control).
Caveats¶
- No quantitative efficacy disclosed. No hit rate, no false-positive rate, no customer-facing numbers on "how many suggestions survive validation" or "how much query runtime improved when customers accepted suggestions."
- No model/provider named. The post says "LLMs" and "LLM-generated" generically; neither the specific model family, the provider, nor the prompt template is disclosed.
- No adoption figures. No customer uptake, no "% of suggestions accepted," no before/after workload data.
- HypoPG limitation unstated. HypoPG lets the planner pretend the index exists for cost estimation, but the planner's cost model is itself an approximation — the actual speedup after applying an index can differ from the estimated one. The post doesn't discuss how often estimated ≠ actual in practice.
- Write-overhead is flagged but not quantified. The post acknowledges "indexes incur storage, memory and write overhead," and the minimum-runtime + minimum-frequency gates are there to avoid recommending indexes that won't earn their write tax back — but no explicit write-cost model is shown.
- Only
CREATE INDEXis in scope. No discussion of dropping unused indexes, modifying existing indexes, composite-key ordering, covering-column selection, or partial / expression indexes — though the prompt pattern generalises to any of these. - Postgres-only at launch. The feature is described
strictly for Postgres; there's no MySQL-side equivalent
in the post (MySQL's
EXPLAIN+EXPLAIN ANALYZEwould need a different hypothetical-index mechanism). - Tier-3 source. PlanetScale first-party product-launch voice. Clears the Tier-3 bar per the PlanetScale skip rules: "Lambert ex-GitHub war-story posts — default include" — this is a Hazen product-architecture post with substantive architectural content (validation pipeline, workload pre-filter, HypoPG integration) across a majority of the body.
Source¶
- Original: https://planetscale.com/blog/postgres-new-index-suggestions
- Raw markdown:
raw/planetscale/2026-04-21-ai-powered-postgres-index-suggestions-119d55bf.md
Related¶
- systems/planetscale-insights
- systems/planetscale-for-postgres
- systems/planetscale
- systems/postgresql
- systems/hypopg
- concepts/secondary-index
- concepts/hypothetical-index-evaluation
- concepts/llm-generated-database-changes
- concepts/index-candidate-filtering
- patterns/llm-plus-planner-validation
- patterns/workload-aware-llm-prompting
- companies/planetscale