PLANETSCALE 2024-04-11

PlanetScale — Profiling memory usage in MySQL¶

Summary¶

Ben Dicken (PlanetScale) canonicalises native MySQL memory profiling via performance_schema's memory-instrumentation tables. Companion post to the same author's 2024-03-29 Identifying and profiling problematic MySQL queries (time-and-rows axis); this 2024-04-11 post covers the orthogonal space axis. Key architectural disclosures: (1) MySQL's ~1,255-entry instrumentation catalog in performance_schema.setup_instruments, of which several hundred are memory/* categories; (2) memory summaries are sampled at five granularities via five tables (memory_summary_by_{account,host,thread,user}_by_event_name + memory_summary_global_by_event_name) — no per-query granularity exists; (3) per-query profiling is achieved by mapping MySQL CONNECTION_ID() to performance_schema.threads.thread_id and sampling memory_summary_by_thread_by_event_name on the monitored thread from a second connection; (4) memory usage is not constant over a query's execution — sampling over time is required to see the shape, motivating a Python + matplotlib live stack-plot monitor sampling at 250–500 ms intervals. Worked example: SELECT alias FROM chat.message ORDER BY alias DESC LIMIT 100000 on a 100M-row table dominated by memory/sql/Filesort_buffer::sort_keys (203 KiB) + memory/innodb/memory (170 KiB); a FULLTEXT index creation grows into hundreds of megabytes.

Key takeaways¶

MySQL's memory instrumentation is catalogued in performance_schema.setup_instruments with ~1,255 total entries, filtered to the memory subset via WHERE name LIKE 'memory/%' (Source: sources/2026-04-21-planetscale-profiling-memory-usage-in-mysql). Canonicalised as concepts/mysql-memory-instrumentation.
Five granularities of memory summary — account / host / thread / user / global — stored in five sibling tables with the memory_summary_*_by_event_name naming convention. "Notice that there is no specific tracking for memory usage at a per-query level. However, this does not mean we cannot profile the memory usage of a query!" Canonical new concept concepts/memory-profiling-granularity.
Per-query profiling = per-thread profiling of the connection running the query because MySQL binds one connection to one thread for the query's lifetime. The canonical mapping pattern is SET @cid = (SELECT CONNECTION_ID()); SET @tid = (SELECT thread_id FROM performance_schema.threads WHERE PROCESSLIST_ID=@cid); — canonicalised as concepts/mysql-connection-id-vs-thread-id and concepts/per-thread-memory-profiling.
Two-connection setup is load-bearing: "you should set up two separate connections to your MySQL server on the command line. The first is the one that will execute the query you want to monitor memory usage for. The second will be used for monitoring purposes." The monitored query blocks its own connection, so observation must come from a different session. Canonicalised as patterns/two-connection-profiling-setup.
Memory usage is temporal, not static — "you may see differences in the results, as memory usage for a query is not necessarily constant over its whole execution. Each execution of this query represents a sample at a moment of time." Motivates the sampling-profiler shape: patterns/periodic-sampling-memory-profiler.
Worked example — ORDER BY on unindexed column forces Filesort: top-4 memory categories on a 100M-row SELECT alias FROM chat.message ORDER BY alias DESC LIMIT 100000 are memory/sql/Filesort_buffer::sort_keys (203,488 B), memory/innodb/memory (169,800 B), memory/sql/THD::main_mem_root (46,176 B), memory/innodb/ha_innodb (35,936 B). The Filesort_buffer::sort_keys instrument is undocumented in setup_instruments (documentation IS NULL) — the name is the only hint.
FULLTEXT index creation is a memory hog — the post shows a second visualization where memory grows "into using hundreds of megabytes as it executes." Contrasts with the bounded sub-megabyte footprint of the read-only ORDER BY example.
Live visualisation via matplotlib stack plot — the second Python script uses plt.stackplot with a 12-colour palette, retains the last 50 samples (sliding window), requests top-N categories by event_name DESC, and labels only categories above total_mem / 1024 / 50 of total (underscore-prefix legend labels suppress display for small categories). Canonicalised as patterns/live-visualization-of-sampled-metrics.

Systems extracted¶

systems/mysql — the host engine; all instrumentation is native.
systems/innodb — source of memory/innodb/* categories (memory/innodb/memory, memory/innodb/ha_innodb, memory/innodb/row0sel).
systems/planetscale — publishing context.
systems/planetscale-insights — productised UX over the same performance_schema data (implicit via the companion post).

Concepts extracted¶

concepts/mysql-performance-schema — the instrumentation subsystem; this post focuses on its memory tables.
concepts/mysql-memory-instrumentation — new canonical wiki concept: the memory-specific subset of setup_instruments + the memory_summary_* table family.
concepts/memory-profiling-granularity — new canonical wiki concept: the five sampling granularities (account / host / thread / user / global); no per-query granularity exists natively.
concepts/per-thread-memory-profiling — new canonical wiki concept: the substitute for per-query profiling, exploiting the 1:1 connection↔thread binding for the query's lifetime.
concepts/mysql-connection-id-vs-thread-id — new canonical wiki concept: CONNECTION_ID() is the user-facing integer; performance_schema.threads.thread_id is the instrumentation-layer integer; the join via threads.PROCESSLIST_ID=<cid> is the mapping.
concepts/mysql-sys-schema — referenced in companion post; sys wraps performance_schema in human-readable views.
concepts/query-stage-profiling — sibling diagnostic axis from companion post.

Patterns extracted¶

patterns/periodic-sampling-memory-profiler — new canonical wiki pattern: sample an instantaneous-state counter table at fixed intervals (250 ms / 500 ms), retain a sliding window, render to a streaming visualisation; applicable to any sampled-counter observability source.
patterns/live-visualization-of-sampled-metrics — new canonical wiki pattern: couple the sampler with an in-process plot (matplotlib stackplot + ion() interactive mode) for real-time monitoring without a separate dashboard pipeline; useful for one-off investigations where a Grafana/Prometheus setup is overkill.
patterns/two-connection-profiling-setup — new canonical wiki pattern: the monitored workload and the observation query run on separate connections because the monitored query blocks its own session. Required for any foreground-blocking diagnostic on MySQL.
patterns/stage-level-query-profiling — sibling profiling pattern from companion post (time axis).
patterns/digest-based-query-prioritization — sibling prioritisation pattern from companion post.

Operational numbers¶

Instrument count: ~1,255 total entries in performance_schema.setup_instruments; several hundred with names matching memory/%.
Table name set for memory summaries: five — memory_summary_by_account_by_event_name, memory_summary_by_host_by_event_name, memory_summary_by_thread_by_event_name, memory_summary_by_user_by_event_name, memory_summary_global_by_event_name.
Worked example workload: 100M-row chat.message table, SELECT alias ... ORDER BY alias DESC LIMIT 100000 with no index on alias.
Top-4 memory footprint at steady state (bytes):
memory/sql/Filesort_buffer::sort_keys — 203,488
memory/innodb/memory — 169,800
memory/sql/THD::main_mem_root — 46,176
memory/innodb/ha_innodb — 35,936
Sampling cadence in the first script: 250 ms.
Sampling cadence in the second script (configurable): default 500 ms via --frequency.
Visualisation retention: last 50 samples, sliding window.
Legend cardinality cap: categories below total_mem / 1024 / 50 of total are underscore-prefixed (hidden from legend).
FULLTEXT index creation: "hundreds of megabytes" peak — qualitative upper bound given by the post's second visualisation, not quantified to a specific number.

Caveats¶

No per-query native granularity — by design; the post's per-query-via-per-thread workflow is a substitute, not a first-class feature.
Filesort_buffer::sort_keys is undocumented — the post explicitly flags that documentation IS NULL for this instrument in setup_instruments; users must infer purpose from the name.
Sampling aliasing risk — short memory spikes between samples are missed; 250–500 ms cadence is a heuristic, not a guarantee.
Short queries are under-sampled — the post itself notes "For short queries this will not be as useful, as we'll only be able to execute this query once, or a small number of times while the profiled query is executing."
Observation overhead — the monitoring script holds its own connection + executes a performance_schema query every tick; not free on small instances.
Script has a typo — connection.close() at the end of main() references an undefined connection (should be database_connection). Minor but real.
No aggregation across samples — the visualisation shows instantaneous values, not percentiles / max-over-interval; peak memory between samples can be missed.
MySQL-only — PostgreSQL has a different observability surface (pg_stat_activity, track_io_timing, extension-based memory accounting) not covered here.
Pedagogical voice, no production numbers — no PlanetScale fleet-wide memory-profile data disclosed; no customer incident narrative.

Source¶

Original: https://planetscale.com/blog/profiling-memory-usage-in-mysql
Raw markdown: raw/planetscale/2026-04-21-profiling-memory-usage-in-mysql-973ba7e9.md

sources/2026-04-21-planetscale-identifying-and-profiling-problematic-mysql-queries — direct companion post by same author on time-and-rows diagnostic axis
systems/mysql
systems/planetscale-insights
concepts/mysql-performance-schema
concepts/mysql-memory-instrumentation
concepts/per-thread-memory-profiling
concepts/mysql-connection-id-vs-thread-id
concepts/memory-profiling-granularity
patterns/periodic-sampling-memory-profiler
patterns/live-visualization-of-sampled-metrics
patterns/two-connection-profiling-setup