PLANETSCALE 2024-02-28

PlanetScale — Introducing schema recommendations¶

Summary¶

Taylor Barnett and Rafer Hazen (PlanetScale, 2024-02-28) launch schema recommendations inside PlanetScale Insights — an automated schema-tuning surface that emits actionable DDL-ready suggestions against a production MySQL database, each with a one-click "create branch and apply" path through the already-canonical deploy-request workflow. The post introduces the internal system — Schema Advisor — that powers the surface: a Kafka-event-triggered background analyzer that listens to schema-change events emitted by production branches, consumes query-performance telemetry from Insights, and runs four rule-based recommendation classes: (1) add missing indexes for inefficient queries, (2) remove redundant indexes (exact duplicates + left-prefix duplicates), (3) prevent auto-increment primary-key exhaustion, and (4) drop unused tables. The architectural core is the use of Vitess's query parser + semantic analysis utilities to extract indexable columns from slow-query candidates — and a patched MySQL fork exposing a custom variant of ANALYZE TABLE ... UPDATE HISTOGRAM that reads per-column cardinalities without writing to the stats table (so composite-index column order can be chosen correctly without side-effects on the production optimizer). Canonical operational datum: 33% of PlanetScale databases have redundant indexes — an empirical argument that schema-level tuning opportunities are common enough to warrant first-class automation. The 2024 predecessor to the 2026 AI-powered index suggestions feature (same Schema Advisor substrate, same Insights UX host, rule-based mechanism instead of LLM + HypoPG validation).

Key takeaways¶

Schema recommendations is a new Insights surface. "With schema recommendations, you will automatically receive recommendations to improve database performance, reduce memory and storage, and improve your schema based on production database traffic." (Source: sources/2026-04-21-planetscale-introducing-schema-recommendations). Each recommendation carries four elements: **explanation
benefits, affected schema/query, exact DDL to apply, and an option to create a branch and apply it for testing** via the deploy- request workflow.
The internal system is called Schema Advisor. "We've built a system that we internally refer to as the 'Schema Advisor.' It can make schema recommendations and understand when a schema change closes an existing open recommendation." Canonicalised as systems/schema-advisor. The system's dual capability — emit recommendations and detect when a deployed schema change resolves an existing open recommendation — means the surface has a lifecycle (open → in-progress → closed) rather than a firehose of static suggestions.
Event-driven architecture keyed on schema-change events. "Each time a production branch's schema changes within PlanetScale, an event is emitted to Kafka. This triggers a background job to examine the schema for potential recommendations." Canonicalised as patterns/event-triggered-schema-analysis. The Kafka event is the coupling substrate between the schema-change apparatus (deploy request apply) and the advisor. This makes the Schema Advisor naturally idempotent by schema-state (re-analyse on every change) and eventually consistent with the live schema — not a polling loop, not an in-band blocking hook.
Four recommendation classes in the v1 launch.
Add missing indexes — daily scan of query performance data identifies "frequently issued queries with a high aggregate ratio of rows read compared to rows returned" (same rows-read-ratio heuristic as the 2026 AI feature). Vitess query parser + semantic analysis extracts indexable columns. Cardinality lookup (below) picks column order for composite indexes.
Remove redundant indexes — two sub-classes: exact duplicates (same columns in same order) and left- prefix duplicates (columns match a prefix of another index in the same order). Canonical operational datum: "Our initial set of recommendations found that 33% of PlanetScale databases have redundant indexes that they may benefit from removing." Schema-only analysis — no query telemetry needed.
Prevent primary-key ID exhaustion — daily scan of AUTO INCREMENT column values. Threshold: "if Insights detects that one of the columns is above 60% of the maximum allowable type, it will recommend changing the underlying column to a larger type." Also scans join / correlated-subquery patterns to find foreign keys on those columns and suggests widening them too (maintaining relational-integrity alignment).
Drop unused tables — daily scan: "identify if any tables are more than four weeks old and haven't been queried in the last four weeks." Two-sided freshness threshold (exclude recent tables from consideration + require recent-query-absence) — protects against both new-table false positives and seasonal-query false positives.
The composite-index-ordering trick: patched MySQL ANALYZE TABLE ... UPDATE HISTOGRAM. Load-bearing architectural detail verbatim: "When adding indexes, column order is critically important. To get that right, we patched our fork of MySQL to create another variant of the ANALYZE TABLE ... UPDATE HISTOGRAM command that allows us to extract the cardinalities of each column without impacting the database's statistics table." Canonicalised as patterns/cardinality-extraction-for-composite-index-ordering. The production optimizer's stats table is load-bearing for query-plan choices; writing to it as a side-effect of recommendation-generation would risk destabilising customer query plans. The patched variant reads histograms without writing — a read-only fork of a write-side command, structurally analogous to hypothetical-index evaluation in the 2026 AI feature (both keep recommendation machinery isolated from production optimizer state).
Vitess as query-parser substrate. "We first identify potential slow query candidates for index suggestions using Insights query data. We then use Vitess's query parser and semantic analysis utilities to extract potential indexable columns for the query." Vitess's parser is already load-bearing at the VTGate query-planner altitude (for plan-tree rewrites); this post canonicalises a second use-case: using the same parser + semantic analyser as a standalone offline utility to extract structure from customer SQL for advisory purposes.
Worked example walkthrough. Post includes a concrete posts table scenario: id/title/text columns, primary key on id only, query SELECT posts.id FROM posts WHERE posts.title = ? in a loop. p50 latency scales linearly with row count as inserts continue (no selective index on title). Add-index recommendation fires daily → suggested DDL = CREATE INDEX idx_posts_on_title ON posts (title) → "Create and apply" creates a branch, applies the DDL, operator deploys to production → p50 drops drastically on the chart. End-to-end worked example of the recommendation → branch → deploy-request → production loop.

Systems¶

systems/schema-advisor (new) — the internal PlanetScale system powering the schema-recommendations surface. Kafka-triggered background analyzer consuming production schema-change events + Insights query telemetry. Four rule-based recommendation classes. Uses patched MySQL + Vitess parser as analysis substrate.
systems/planetscale-insights — hosts the schema-recommendations UI; provides the per-query performance telemetry (rows read, rows returned, frequency, runtime share) that feeds index suggestions and unused-table detection.
systems/planetscale — overall vendor / product context.
systems/mysql — the forked engine carrying the custom read-only ANALYZE TABLE ... UPDATE HISTOGRAM variant. Auto-increment type limits are the primitive behind primary-key ID exhaustion recommendations.
systems/vitess — query parser + semantic analysis utilities used offline by Schema Advisor to extract indexable columns from Insights-identified slow-query candidates. Second wiki-canonical use-case of the Vitess parser beyond VTGate planning.
systems/kafka — event substrate: production schema-change events are emitted to Kafka, triggering the Schema Advisor background analyzer.

Concepts¶

concepts/redundant-index-detection (new) — the schema-level analysis that identifies two classes of duplicate indexes (exact duplicates: same columns in same order; left-prefix duplicates: one index's columns are a prefix of another's). Purely structural; no query telemetry required. Canonical datum: 33% of PlanetScale databases have at least one redundant index.
concepts/primary-key-id-exhaustion (new) — the failure class where an AUTO INCREMENT primary key approaches or reaches the maximum value of its column type (e.g. INT UNSIGNED max = 4,294,967,295). Once hit, subsequent inserts fail and can take an application offline. Numerous high-profile outages have been caused by this failure mode. Preventable via threshold-gated monitoring; Schema Advisor fires at 60% of type maximum.
concepts/unused-table-detection (new) — the query-telemetry-based analysis that flags tables as candidates for dropping based on two-sided temporal freshness: (a) table age ≥ 4 weeks, AND (b) no queries against the table in the last 4 weeks. The first condition prevents new-table false positives; the second excludes tables that are actually accessed.
concepts/secondary-index — the structure being recommended (for index adds) or removed (for redundant indexes). Redundant-index removal is motivated by the secondary-index write tax (every mutation updates every index on the table).
concepts/composite-index — multi-column indexes. The composite-index-ordering problem (which column comes first in the key) is the specific problem the patched ANALYZE TABLE ... UPDATE HISTOGRAM variant solves: higher-cardinality columns generally belong first in the key for better selectivity on partial matches.
concepts/deploy-request — the existing workflow the "Create and apply" button dispatches into. Schema recommendations feed directly into this lifecycle, meaning acceptance of a recommendation goes through the same safety apparatus (online DDL, gated deployment, revert window) as any other schema change.
concepts/schema-change-queue — when a recommendation is applied, the resulting schema change enters the same FIFO queue (with combined-schema lint) that any other deploy request enters.

Patterns¶

patterns/event-triggered-schema-analysis (new) — emit schema-change events to a message substrate (Kafka here); a background analyzer consumes each event and re-runs rule-based checks against the new schema state. The substrate provides natural idempotency (replaying an event re-analyses harmlessly), decoupling (the schema-change apparatus doesn't block on the advisor), and eventual consistency (recommendations converge to reflect the latest schema).
patterns/cardinality-extraction-for-composite-index-ordering (new) — patch or extend the database engine to expose a read-only variant of a normally-write-side stats command, so recommendation machinery can read per-column cardinalities without mutating the production optimizer's statistics table. Structural sibling of HypoPG-style hypothetical evaluation — both keep advisory machinery isolated from live optimizer state. In MySQL 8.0 this is a patched variant of ANALYZE TABLE ... UPDATE HISTOGRAM.
patterns/workload-aware-llm-prompting — the 2026 successor pattern uses the same workload pre-filter (rows-read-ratio + aggregate runtime share + minimum frequency) to choose which queries are index-suggestion candidates, but then feeds them to an LLM instead of to rule-based logic.
patterns/llm-plus-planner-validation — the 2026 successor pattern's output-side validator (HypoPG + EXPLAIN) is the mechanism-equivalent of the 2024 patched ANALYZE TABLE cardinality extraction: both keep the advisory path from destabilising production optimizer state.

Operational numbers¶

33% of PlanetScale databases have at least one redundant index (exact duplicate or left-prefix duplicate).
60% of maximum column value: the threshold at which Schema Advisor flags an AUTO INCREMENT primary key for type widening.
4 weeks: both the minimum table age AND the minimum recent-query-absence window for the unused-table recommendation.
Daily scan cadence for missing-index candidates, primary-key-exhaustion checks, and unused-table detection.
On-change scan cadence for redundant-index detection (every production schema change triggers a scan).
Maximum value for INT UNSIGNED = 4,294,967,295 (~4.3 × 10⁹) — the exhaustion threshold for the most common auto-increment column type.
Four recommendation classes in v1 ("we will add more over time").

Caveats and scope gaps¶

Marketing-title but architecture-body. Title begins with "Introducing" — the canonical batch-skip pattern — but the body is ~65% architecture substance (Kafka event-driven substrate, Vitess parser substrate, patched MySQL fork for cardinality extraction, 33% redundant- index operational datum, four distinct recommendation classes with mechanism-level descriptions, worked example with explicit DDL). Narrow-Tier-3 batch-skip override per task instruction ("Follow AGENTS.md end-to-end — no shortcuts"). Architecture content clears the ≥20% threshold decisively.
No cadence numbers for the Kafka analyzer — post says Kafka triggers a background job but gives no latency from deploy-apply to recommendation refresh.
No false-positive rates for any of the four recommendation classes. The 33% redundant-index datum is population-level but not per-instance accuracy.
No disclosure of patched-MySQL-fork upstream status. Is the ANALYZE TABLE ... UPDATE HISTOGRAM read-only variant a candidate for upstream contribution? Would require MySQL Group / Oracle acceptance. Post doesn't discuss.
Minimum frequency threshold for missing-index candidates not quantified (only "frequently issued queries" qualitatively). The 2026 AI successor post also doesn't quantify it but the workload-share threshold (≥0.1% aggregate runtime) and rows-read ratio get quantitative framing there.
Four-week unused-table window not justified empirically — no tuning-curve analysis, no false- positive-rate-vs-window trade-off.
Vitess parser extraction details elided — the semantic-analysis step that maps a query's WHERE title = ? to an "indexable column" is non-trivial (joins, subqueries, functional indexes, expression predicates). Post doesn't specify coverage boundaries.
Composite-index ordering heuristic not disclosed. Post says cardinality is extracted via patched ANALYZE but doesn't describe the rule that turns cardinality numbers into a column ordering decision.
MySQL-only. Postgres not mentioned. Schema Advisor's Postgres variant arrived as the 2026 AI- powered index suggestions feature, which uses HypoPG as the analogue of the patched MySQL ANALYZE variant.
Tutorial-altitude worked example. The posts table walkthrough is illustrative, not a production scaling disclosure.
No adoption metrics — no opt-in rate, no recommendation-acceptance rate, no time-to-close distribution.

Cross-source continuity¶

Predecessor to the 2026 AI-powered index suggestions post (sources/2026-04-21-planetscale-ai-powered-postgres-index-suggestions). Same Schema Advisor substrate, same Insights host surface, same workload-telemetry pre-filter pattern. Mechanism difference: this 2024 post uses rule-based logic + Vitess parser + patched MySQL ANALYZE for MySQL; the 2026 successor uses LLM + HypoPG validation for Postgres. The two posts together canonicalise the advisory-machinery-isolated-from- production-optimizer-state principle across two eras of AI tooling evolution.
Companion to sources/2026-04-21-planetscale-debugging-database-errors-with-insights (Rafer Hazen, 2022-09-27) — same author, earlier Insights feature (error tracking). The 2024 schema- recommendations post extends Hazen's Insights-as- advisor role from passive observability (surface what happened) to active tuning advice (propose what to change).
Companion to sources/2026-04-21-planetscale-enhanced-tagging-in-postgres-query-insights (Rafer Hazen, 2026-03-24) — the dual-stream Insights telemetry this post consumes is the same substrate Hazen describes in architectural detail two years later. Schema recommendations reads from both streams: the aggregate stream (for workload-share filtering) and, implicitly, the notable-query stream (for slow-query candidate identification).
Complements sources/2026-04-21-planetscale-how-planetscale-makes-schema-changes (Mike Coutermarsh, 2024-04-04) — that post canonicalises the PR-bot → deploy-request flow for developer-initiated schema changes; this post canonicalises an advisor- initiated flow that terminates in the same deploy- request apparatus. Two input pathways converging on the same schema-change queue + lint + revert-window machinery.
Complements sources/2026-04-21-planetscale-identifying-and-profiling-problematic-mysql-queries (Ben Dicken, 2024-03-29) — Dicken canonicalises the manual MySQL query-diagnosis workflow (performance_schema + sys + digest + stage profiling + EXPLAIN ANALYZE); this post is the automated variant. Dicken's post frames Insights as the productised UX over that digest data; this post is what Insights does with it beyond display.
No contradictions with existing wiki canon — the post is a new mechanism-level disclosure about an already-canonicalised system (Insights / Schema Advisor).

Source¶

Original: https://planetscale.com/blog/introducing-schema-recommendations
Raw markdown: raw/planetscale/2026-04-21-introducing-schema-recommendations-a44259aa.md