PLANETSCALE 2024-03-29 Tier 3

PlanetScale — Identifying and profiling problematic MySQL queries¶

Summary¶

Ben Dicken (PlanetScale, 2024-03-29) publishes a pedagogical field manual for native MySQL query diagnosis: how to use performance_schema + sys tables to identify which queries to fix, and EXPLAIN ANALYZE + stage-timing profiling via setup_instruments / setup_consumers / events_stages_history_long to drill into individual query execution. The post closes by positioning PlanetScale Insights as the product that replaces the manual poking-around workflow with a visualisation + anomaly-detection dashboard over the same underlying data.

Key takeaways¶

performance_schema is an in-memory storage engine ("all of the information it tracks is stored in an in-memory PERFORMANCE_SCHEMA storage engine") containing ~113 tables on recent MySQL versions. It's on by default but can be disabled on memory-constrained hosts. (Source: this post.)
events_statements_summary_by_digest is the entry point for "which queries are expensive?" — keyed by digest (normalised SQL with literals stripped), it exposes COUNT_STAR, SUM_TIMER_WAIT, AVG_TIMER_WAIT, MAX_TIMER_WAIT, SUM_LOCK_TIME. Timer values are in picoseconds (divide by 1e12 for seconds).
sys schema is the ergonomic front-end over performance_schema. Canonical diagnostic tables named: statements_with_sorting, statements_with_runtimes_in_95th_percentile, statements_with_full_table_scans.
table_io_waits_summary_by_index_usage surfaces per-index hit counts including reads that bypassed every index (INDEX_NAME IS NULL row). Worked example: 2.57 billion unindexed reads on game.message vs ~164k for the one used index (from_id) — diagnostic signature of missing indexes.
Stage-level profiling via events_stages_history_long requires three config toggles: setup_instruments.ENABLED = 'YES', setup_consumers.ENABLED = 'YES', and setup_actors.HISTORY = 'YES'. Workflow: capture thread_id from performance_schema.threads, run the query, look up the statement in events_statements_history_long, bracket with its event_id / end_event_id, then query events_stages_history_long between those bounds to see per-stage milliseconds (stage/sql/executing, stage/sql/optimizing, stage/sql/statistics, stage/sql/Opening tables, stage/sql/waiting for handler commit, ...).
Worked stage-profile datum: on the post's problematic query, stage/sql/executing consumed 735.3 ms out of a ~736 ms total — execution-bound, not lock-bound or optimising-bound. The value of stage timing is that it would flag lock-wait or optimiser time if those were the bottleneck.
Selective profiling via setup_actors: flip global defaults off with UPDATE setup_actors SET ENABLED='NO', HISTORY='NO' WHERE HOST='%' AND USER='%', then insert a specific (HOST, USER) row to scope instrumentation to a single test principal — avoids the performance tax of fleet-wide history tracking.
EXPLAIN ANALYZE reports actual per-iterator costs alongside planner estimates — the worked query shows a nested-loop join with a full 1M-row table scan on message followed by two single-row PK lookups per row, actual 320 ms on 345,454 rows.
PlanetScale Insights is the vendor's answer to the tedium: "gleaning this information can be tedious. Getting exactly what you want requires significant poking around and digging through tables in performance_schema and sys." Insights provides visualisations + sort-by-rows-read + automatic anomaly detection over the same underlying digest data.

Systems extracted¶

MySQL — target engine; performance_schema + sys are shipped with the server.
InnoDB — storage-engine context for the index-usage table (PRIMARY / secondary keys).
PlanetScale Insights — productised observability over the same data; Postgres + MySQL surface.
PlanetScale — publisher; positions Insights as the managed-service alternative to manual poking.

Concepts extracted¶

concepts/mysql-performance-schema — in-memory instrumentation subsystem; ~113 tables; in-memory storage engine.
concepts/mysql-sys-schema — curated views over performance_schema for human-readable diagnosis.
concepts/query-digest — normalised SQL shape keyed by literal-stripped form; unit of workload aggregation.
concepts/query-stage-profiling — per-execution-stage timing via setup_instruments + events_stages_history_long.
concepts/mysql-explain-analyze — execution-profile variant already on wiki (extended).
concepts/mysql-access-type — already on wiki; referenced for interpretation of full-table-scan findings.
concepts/observability — extended with database-internal-telemetry axis.

Patterns extracted¶

patterns/digest-based-query-prioritization — sort digest table by SUM_TIMER_WAIT / AVG_TIMER_WAIT / rows-examined-avg to select queries for investigation.
patterns/stage-level-query-profiling — three-toggle + thread-id + event-id bracket workflow for per-stage millisecond breakdown.
patterns/index-usage-per-table-diagnostic — table_io_waits_summary_by_index_usage WHERE object_name=... to surface unindexed-read volume relative to per-index hits.
patterns/explain-before-execute-validation — already on wiki; referenced as the drill-down complement to digest-level identification.

Operational numbers¶

~113 performance_schema tables on a recent MySQL.
Timer unit: picoseconds (divide by 1 trillion for seconds).
Worked datum: 735.3 ms in stage/sql/executing out of ~736 ms total wall-clock.
Worked datum: 2.57 × 10⁹ unindexed reads vs 164,500 indexed reads on a single table — ratio as diagnostic signal.
Worked datum: full-table-scan query running 6,742 times accumulating 6.45 min of latency.
EXPLAIN ANALYZE worked datum: nested-loop join with table scan on m (1M rows) + single-row PK lookup on p1 + single-row PK lookup on p2, actual 320 ms for 345,454 rows.

Caveats¶

Instrumentation has overhead — Dicken explicitly flags "(small) adverse effect on the overall performance of your system" from enabling history for all users; selective actor scoping is the mitigation.
performance_schema is in-memory — data is lost on restart; no retention beyond memory size; no long-term trend analysis without an exporter.
EXPLAIN ANALYZE actually runs the query — unsafe for expensive queries on production and for UPDATE / DELETE / INSERT side effects (already canonicalised on the EXPLAIN ANALYZE wiki page).
Timer-unit surprise — picoseconds is easy to misread as nanoseconds; every derived number is 1000× off if the mistake is made.
Digest grouping is literal-stripped — queries with different parameters are grouped; non-parameter differences (different table names, different column lists) are separate digests.
No production deployment numbers — post is pedagogy; PlanetScale doesn't disclose how Insights' own ingestion of this data scales, nor the retention policy or sampling strategy behind the visualisations.
Tier-3 PlanetScale pedagogical voice — Ben Dicken's fifth-plus wiki ingest (canonical database-internals educator); default-include per companies/planetscale skip rules.

Source¶

systems/mysql — host engine; performance_schema + sys ship with it.
systems/innodb — storage-engine context for unindexed-read diagnosis.
systems/planetscale-insights — productised successor to the manual workflow.
concepts/mysql-performance-schema — new canonical wiki page.
concepts/mysql-sys-schema — new canonical wiki page.
concepts/query-digest — new canonical wiki page.
concepts/query-stage-profiling — new canonical wiki page.
concepts/mysql-explain-analyze — drill-down companion.
concepts/mysql-access-type — interpretation of full-table-scan findings.
patterns/digest-based-query-prioritization — identify what to fix.
patterns/stage-level-query-profiling — drill into why it's slow.
patterns/index-usage-per-table-diagnostic — surface missing indexes.
companies/planetscale — publisher.