SYSTEM Cited by 4 sources
PlanetScale Insights¶
What it is¶
Insights is a Postgres extension developed by PlanetScale and available exclusively on PlanetScale Postgres clusters. It provides per-query performance telemetry and hosts Traffic Control, the feature that enforces workload-class resource budgets on tagged queries.
Capabilities¶
- Per-query performance telemetry — query duration, plan shape, resource consumption aggregated per statement for "find slow SQL" workflows.
- Traffic Control — tag queries with
SQLCommenter metadata
(e.g.
action=analytics) and apply Resource Budgets capping concurrent workers, server share, or per-query cost. See systems/planetscale-traffic-control. - AI-powered index suggestions — Insights periodically
uses its own per-query-pattern telemetry (rows read,
rows returned, runtime share, frequency) to pick
candidate queries, feeds them to an LLM that proposes
CREATE INDEXstatements, and validates each candidate by runningEXPLAINwith a hypothetical index via HypoPG. Only suggestions that materially reduce planner-estimated cost are surfaced. See concepts/llm-generated-database-changes, patterns/llm-plus-planner-validation, patterns/workload-aware-llm-prompting. - Error tracking — per-query-pattern database error
feed with one entry per error class, a rate graph, and
an error-occurrences page keyed on
{timestamp, normalised_sql, tags}. Every occurrence carries the same tags the successful queries do — most usefully theactortag identifying the authenticated principal. Search syntax:tag:tag_name:tag_value(equality) andtag:tag_name(existence). Tag-capture threshold for queries: "at least one query that took more than 1 second, read more than 10k rows, or resulted in an error" — errors always carry tags. Canonicalised as concepts/actor-tagged-error and patterns/actor-tagged-query-observability.
Why it exists¶
Upstream Postgres ships observability primitives
(pg_stat_statements, pg_stat_user_tables) but no first-class
mechanism for classifying queries by workload and enforcing
class-level concurrency limits. Insights fills that gap with
PlanetScale-specific extension code running inside the
PlanetScale Postgres binary.
Seen in¶
-
— Substrate-migration capstone of the Insights production pipeline: EBS-plus-provisioned-IOPS → Metal on PlanetScale's own internal workload. Rafer Hazen (2025-03-11, re-fetched 2026-04-21) migrated the Insights backing database — "the PlanetScale database that powers the Query Insights feature" — from EBS (with provisioned-IOPS upgrades on the sharded keyspace) to Metal direct-attached NVMe. First canonical self-reported Metal production-migration datum on the wiki with full workload-shape disclosure: 8 MySQL/Vitess shards, "approximately 10k
UPDATE/INSERTstatements per second", "32 consumer processes, each with 25 writer threads for a total max concurrency of 800 threads", pre-aggregating Kafka messages in memory per batch and handing writes off to a thread pool (the Kafka- consumer-side concurrency model that the 2023-08-10 storing-time-series-data post architected). Migration used the canary-shard substrate-migration pattern in its busiest-first variant: upgrade the worst-performing shard (the purple line in the 4 per-shard percentile plots) first, soak "a few days", then roll the remaining 7 to "nearly identical improvement in performance." Canonical diagnostic framing: "this workload had demonstrated a sensitivity to I/O latency" — i.e. I/O-latency-sensitive, a distinct axis from IOPS-cap where provisioned-IOPS-on-EBS hits a latency floor that only a substrate swap dissolves. Outcome metrics: "substantial decrease in latency across all the measured percentiles" (p50 / p90 / p95 / p99 each shown as 8-line-per-shard plots with the migrated shard diverging below the pack) + lower Kafka-consumer backlog + "additional capacity to handle increasing message volume in the future" — the 800-thread concurrency × faster-per-write-latency = higher drain rate chain. Load-bearing wiki claim for Metal's substrate-swap positioning: "Without making any changes to our application, architecture, or sharding configuration, we were able to realize substantial performance improvements by upgrading to PlanetScale Metal." Self-dogfooding datum — PlanetScale runs Insights on PlanetScale, migrated it to Metal, and reports the result; closes the 19-month substrate-migration loop opened by the 2023-08-10 Insights-on-sharded-MySQL architectural post. Caveat: no absolute before/after latency numbers published (graphs show direction + shape only); no cost delta disclosed; soak period "a few days" not quantified. -
— Genesis of the feature: 2021-era launch of per-query statistics in PlanetScale. David Graham (PlanetScale, 2021-08-31) announces the foundational primitive — Vitess-layer tracking of every query's execution count, rows returned, and duration, with a 100 ms slow-query flag threshold — that every later Insights capability builds on. "Starting today, all PlanetScale database branches now track statistics about each SQL query that has executed against it without any overhead." The post establishes: (1) proxy-layer telemetry over server-layer telemetry — Vitess observes on the query path, so MySQL doesn't need
performance_schemaenabled; (2) the 100 ms flag threshold that later Insights views inherit; and (3) the query-stats → EXPLAIN → redesign → verify workflow via a worked backup-deletion case study (719 ms → under 20 ms, 98 % reduction by reordering a composite index's columns so the most selective predicate came first). Canonical for query-statistics telemetry, [[concepts/100ms-slow-query-threshold|100 ms slow-query threshold]], and composite-index column order. The 2024+ Insights posts build on this foundation with query-digest fingerprinting, SQLCommenter tagging, DDSketch percentiles, sharded-MySQL rollup tables, and LLM-based index suggestions — but the read-your-writes primitive of "capture every query, surface the slow ones" is this 2021 post's. -
— Canonical wiki acknowledgement of an observability gap in Insights' feature surface. Simeon Griggs (PlanetScale, 2026-03-30) names explicitly that "Query Insights helps you investigate query performance through CPU time, I/O, and latency, but it does not show per-query memory. You may see OOM markers and slow-query signals, but not query-specific RSS usage." The underlying reason: RSS is a per-process metric, not a per-query metric — Postgres's process-per- connection architecture means there's no natural boundary to attribute memory growth to a single query. Insights is prescribed as one of four signals in the patterns/triangulate-rss-growth-from-metrics workflow (alongside Cluster Metrics,
EXPLAIN (ANALYZE, BUFFERS, MEMORY), and connection count), not as a stand-alone memory-attribution tool. First wiki Seen-in entry for Insights that documents a limitation rather than a capability. -
— Canonical disclosure of the
pginsightsimplementation substrate that Traffic Control builds on. Patrick Reynolds (PlanetScale, 2026-03-23) canonicalisespginsightsas the host extension for Traffic Control — not because Traffic Control was always planned to live there, but because of a hook-colocation decision: "Traffic Control needed the same hook points and much of the same information thatpginsightsalready had. So rather than duplicate all that code and impose the extra runtime overhead of another extension, we taughtpginsightshow to block queries." Names the specific Postgres hookspginsightsinstalls (ExecutorRun,ProcessUtility) and frames each hook as wrapping the original Postgres function so the extension sees each query "just before it executes and again just after it completes. Any time that has elapsed and any resources the worker process has consumed are directly attributable to that query." Also reveals the internal per-(query-pattern × host) EMA hash tablepginsightsmaintains: two exponential moving averages per pattern (observed CPU time, observed planner cost) whose ratio is thekconstant feeding Traffic Control's pre-execution cost estimator (patterns/plan-cost-times-k-estimator). Canonical split history: Traffic Control was originally planned as a single new extension; scoping split it along the static-vs-dynamic blocking axis — the static-analysis half becamepg_strict, the dynamic / cumulative-resource half merged intopginsights. -
— Canonical wiki disclosure of the
Indexestab and the handler-hook + wire-protocol-sidecar mechanism that captures per-query index usage. Rafer Hazen (2024-08-14; re-fetched 2026-04-21) announces per-query- pattern index-usage tracking: a time-series graph of which indexes each query pattern used, plus two new search predicates —index:table.indexfor "which queries use this index?" (inverse direction) andindexed:falsefor "find unindexed patterns". The mechanism is a hook into InnoDB'sindex_init()storage-handler callback: "The InnoDB storage handler includes an index-initialization function that MySQL calls (once) prior to using an index in a query. By recording the index name passed to this function in a per query data structure, we're able to find the set of all indexes used by each query." The per-query index set (concepts/per-query-used-index-set) then rides along in the final MySQL wire-protocol packet back to VTGate, which aggregates per fingerprint and emits into the existing 15-second Insights pipeline. Canonicalised as patterns/handler-hook-sidecar-telemetry. Composes with existing predicate grammar: the post's worked exampleindexed:false query_count:>1000 p50:>250surfaces unindexed high-volume slow patterns in a single search — canonical productisation of patterns/digest-based-query-prioritization via the bidirectional-attribution predicates (patterns/bidirectional-index-query-attribution). Canonical framing of Insights' structural position vs the two MySQL-native predecessors:EXPLAINis "query-specific but not production-sampled";performance_schema.table_io_waits_summary_by_index_usageis "production-sampled but not query-attributed and not time-series"; Insights'Indexestab is all three axes (production-sampled + query-pattern-attributed + time-series). Load-bearing operational caveat: "index information is only reported forSELECTqueries, so it's important to independently verify that indexes aren't being used inUPDATEorDELETEqueries before removing an index" — concepts/select-only-index-telemetry-caveat canonicalises the DML-coverage gap and the two-step drop-index verification workflow (InsightsIndexestab shows 0% → independentlyEXPLAINDML paths +performance_schemaper-(table, index) counters before dropping). Coverage claim verbatim: "aggregate the time series count of indexes used for 100% of queries with negligible overhead in MySQL." Shipping gate: "Index usage information is available on all PlanetScale databases." — no Enterprise tier, no opt-in. Sits chronologically between the 2023-08-10 sharded-MySQL pipeline disclosure and the 2026-era Kafka→ClickHouse + enhanced-tagging + AI-index- suggestions posts — the index-usage telemetry is the substrate beneath the later AI-index-suggestion feature's "rows-read vs rows-returned" ratio input oracle (the 2026-04 AI post pre-filters candidate patterns using Insights' existing per-pattern axes; this post's index-usage axis is one of those inputs). Hazen's sixth wiki Insights ingest, mid-point of his Insights corpus. -
— Canonical wiki pipeline-architecture disclosure for Insights (Rafer Hazen, 2023-08-10; re-fetched 2026-04-21). Sits between and — the sharded-MySQL era of Insights storage, ~2.5 years before the ClickHouse migration. Canonicalises a hybrid telemetry store: low-cardinality DB-level metrics in Prometheus (patterns/hybrid-timeseries-store-prometheus-plus-sharded-mysql); high-cardinality per- fingerprint metrics in a sharded MySQL cluster fronted by Vitess. VTGate emits two Kafka topics — aggregate summary per fingerprint every 15 seconds, slow-query events immediately — both buffered to a bounded 5 MB in-memory queue and flushed asynchronously so query serving never blocks on Kafka (concepts/async-kafka-publication-for-telemetry). Kafka messages are keyed on
hash(database_id, fingerprint)so same-pattern messages land on the same partition, letting consumers coalesce aggregates in-memory per batch — 30–40% DB-write reduction (concepts/in-memory-coalescing-by-kafka-key + patterns/deterministic-key-hash-for-partition-affinity). Consumer batch size ~200 typical, up to 1,000 under spike — larger batches accelerate backlog burn-down. Writes land in per-hour and per-minute rollup tables (concepts/per-hour-per-minute-rollup-tables + patterns/dual-granularity-rollup-tables) with (topic, partition, offset) uniqueness constraints for consumer-retry dedup (concepts/kafka-offset-partition-uniqueness-constraint). Slow-query events use a token-bucket limiter with generous burst + bounded throughput (concepts/token-bucket-slow-query-limiter). Cluster is sharded bydatabase_id(concepts/shard-key-database-id) — "we never need to join data across customer databases, and it results in a fairly even distribution of data across shards." DDSketch production parameter: α = 0.01 (≤1% relative error) — first wiki citation of the concrete α after the 2023-04-20 arXiv-only citation. Sketches are persisted as MySQL BLOB columns (concepts/sketch-as-mysql-binary-column) read and merged by custom loadable C++ MySQL functions (concepts/loadable-mysql-function) — percentile calculation happens in SQL, not in the application layer: "Performing these functions in MySQL allows us to calculate percentiles without needing to pull the underlying sketches into our application. It also lets us use the full expressive power of SQL to get the data we need." Library planned open-source but unreleased as of 2023-08-10. Fleet hardware posture: 2 vCPU / 2 GB per shard, 4 → 8 shards reshard earlier 2023, achieved via 8-day dual-write to a new cluster matched to the 8-day retention window — explicitly chosen over Vitess live reshard "to build operation experience resharding" (patterns/dual-write-branch-cutover-via-new-cluster + concepts/dual-write-branch-migration + patterns/small-shards-wide-fleet). Linear write- throughput scaling with shard count confirmed via load tests + production metrics. Consumer write load: ~5,000 writes/sec to the MySQL cluster. Explicitly rejects OLAP databases ("significant operational overhead and a steep learning curve") — a choice the later ClickHouse migration reversed, presumably as OLAP operational experience crossed the viability threshold. Canonical framing that reframes the Insights architecture: Insights is a Kafka → sharded-MySQL pipeline with a Prometheus sidecar for low-cardinality metrics — the ClickHouse pipeline is the successor not the origin. -
— Earliest wiki disclosure of Insights's query-pattern fingerprinting mechanism and the sketch-based error-bounded percentile substrate. Rafer Hazen (2023-04-20) canonicalises the "fundamental unit of analysis" on Insights as the query pattern, not the individual execution: "PlanetScale databases often receive thousands or even millions of queries per second, reporting performance stats for every individual query isn't usually what we want." Canonicalises Vitess's SQL parser as the normalisation substrate — "The Vitess query serving layer converts SQL into an abstract syntax tree, which we then walk to replace literals with generic placeholders" — and calls out that normalisation covers more than literals: "Beyond literal extraction, Vitess's AST normalization also helps eliminate surface-level syntactic differences, such as casing differences or the presence of redundant parentheses." Canonicalised as patterns/ast-normalized-query-fingerprinting. The per-pattern aggregate axes disclosed verbatim: "the number of queries, total execution time, and total rows read, returned, and written" — the exact input oracle the later AI index suggestions post uses. Percentile reporting is explicitly sketch-based: "We also send along a sketch of query execution times that allows us to show error-bounded percentiles (e.g., median and p95)" — citing arXiv:1908.10693 (DDSketch-family). The worked production example — 100M-row table, hourly
DELETE ... LIMIT 500, p95 degrading from "a few hundred milliseconds" in minutes 10–20 past the hour to "almost 15 minutes" in minutes 20–30 — canonicalises concepts/delete-limit-asymptote and patterns/index-from-tail-latency-pattern. The#505annotation on the post-fix latency graph canonicalises concepts/deploy-marker-annotation + patterns/deploy-marker-overlay-on-time-series — first wiki instance of a deploy-marker overlay keyed on deploy-request number rather than a free-form tag. Structural dogfooding signal: this is PlanetScale's own production database at its 505th deploy request. Canonical framing: Insights productises per-query-pattern time-series telemetry (concepts/per-pattern-time-series) on top of the fingerprint + sketch substrate; the fingerprint collapses the query-per-second stream to a pattern-per-interval stream that humans can sort, graph, and diagnose. Earliest Hazen Insights ingest on the wiki — precedes the 2022-09-27 error-debugging, 2023-11-28 anomalies, 2026-03-24 enhanced-tagging, and 2026-04 AI-index-suggestions posts with the fingerprinting-substrate layer beneath all of them. -
— earliest wiki disclosure of Insights's
>1 sectail-query capture threshold. Coutermarsh + Ekechukwu (2022-06-29) disclose verbatim: "Any query that takes over 1 second to run will get recorded and tagged with the values you've set in your sql comments." Narrower single-axis predecessor to the three-axis threshold (">1 sec OR >10k rows OR error") canonicalised on 2022-09-27 by Hazen (). Earliest wiki instance of the tag-then-filter-in-Insights debugging workflow in PlanetScale's own production voice (a slow query onschema_snapshottagged/*application='ApiBb,job='ScheduleSnapshotJob'*/localised via Insights — "Using Insights and tags on the slow query, we were able to find exactly where this query was coming from.") — three months before the more famousactor-tag debugging story. Positionsactiverecord-sql_commenteras the canonical Rails-side gem that makes Insights' SQLCommenter parser see the application's query-log tags. -
sources/2026-04-21-planetscale-identifying-and-profiling-problematic-mysql-queries — Ben Dicken's 2024-03-29 "Identifying and profiling problematic MySQL queries" walks through the native MySQL workflow Insights is designed to replace: manually querying
performance_schemaandsysschema tables, sorting by digest aggregates, drilling via stage-timing profiling andEXPLAIN ANALYZE. Closing framing (verbatim): "gleaning this information can be tedious. Getting exactly what you want requires significant poking around and digging through tables inperformance_schemaandsys. Many of these same observations can be gathered much easier using PlanetScale Insights." Canonical framing on the wiki: Insights is the productised UX over the same underlyingperformance_schema+sysdigest data that every MySQL has shipped for a decade; the vendor differentiation is the dashboard + anomaly detection + sort-by-rows-read, not novel telemetry collection. Complements the Hazen 2026 posts on Postgres Insights — the same product framing applied to Postgres'spg_stat_statementssubstrate. -
— Earliest wiki citation of Insights as a named product from the PlanetScale CEO voice. Sam Lambert (2022-08-02) introduces Insights as "a next generation monitoring solution that helps you discover bad queries in real time" with a "data pipeline that logs the query and its performance metrics" — the 2022-era product-positioning framing that pre-dates the dual-stream Kafka → ClickHouse architecture canonicalised by Hazen 2026-03-24. Lambert also gives the earliest wiki citation of SQLCommenter query comments on Insights: "you can use query comments to tag and identify the source of queries." Architectural substance is thin here; the durable contribution is the pairing of Insights as the "application issues" mitigation in Lambert's three- pillar downtime-prevention taxonomy (alongside deploy-time warn-on-drop + schema revert for human error and Vitess-as-substrate for system immaturity). The drop-safety warning is itself Insights-adjacent: the "recently queried" signal is the platform's query telemetry, which is exactly the substrate Insights productises — see concepts/query-telemetry-as-deploy-safety-signal.
-
sources/2026-04-21-planetscale-graceful-degradation-in-postgres — Canonical warn → enforce observability surface + canonical
[PGINSIGHTS] Traffic Control:in-band warning channel. Ben Dicken canonicalises Insights' role as the over-budget violation surface for Traffic Control: operators observe warn-mode flag counts over multi-hour windows in Insights before flipping budgets to enforce. Canonicalises Insights as the piggyback channel that delivers warnings inside the Postgres query response ("[PGINSIGHTS] Traffic Control:") so applications can observe budget pressure "from within your application without any user-facing effects" — warn- mode observability survives alongside enforce-mode blocking via the extension-layer wire-protocol piggyback. -
— Introduced as the vehicle that makes Traffic Control possible on PlanetScale Postgres. The article's concrete Traffic Control configuration (cap
action=analyticsto 1 worker, 25% ofmax_worker_processes) uses Insights' SQLCommenter tag recognition as the classification substrate. - sources/2026-04-16-cloudflare-deploy-postgres-and-mysql-databases-with-planetscale-workers — Called out as one of PlanetScale Postgres's headline differentiators on the Cloudflare partnership launch, alongside database branching and agent-driven SQL tooling.
- sources/2026-04-21-planetscale-ai-powered-postgres-index-suggestions
— Insights gains an AI-powered index suggestions
surface. Rafer Hazen's launch post canonicalises Insights'
per-query-pattern telemetry (rows-read vs rows-returned ratio,
≥ 0.1% aggregate runtime share, minimum frequency) as the
input oracle that makes LLM-generated index suggestions
useful — and HypoPG+
EXPLAINas the output oracle that makes them safe. Positions Insights' workload data as the structural moat: "to use the unique data and capabilities available in Insights to produce the best overall results." - — Insights gains an error-tracking surface and a
tag:query-search syntax. Rafer Hazen's 2022 launch post (re-surfaced 2026-04-21) uses a real PlanetScale internal debugging session as the worked example:AlreadyExistserrors ondatabase_branch_password's unique index at "a few 10s to a few hundred per day", clustering in sub-second batches (temporal clustering → concurrency hypothesis), all tagged with the sameactor(single-caller hypothesis). Canonicalises concepts/actor-tagged-error (principal identity on every error occurrence), concepts/query-tag-filter (tag:key:value+tag:keysyntax), and the check-then-act race class (concepts/check-then-act-race) that Insights' tag-based observability made trivially debuggable. The Insights UI contribution: the debug session stays in the tool from error spike → occurrences → tag lookup → actor identity → code fix, no log joins, no request-ID tracing, no application instrumentation. Structural moat framing: observability tools without per-query actor tags require application-layer per-error-class instrumentation to get the same signal; Insights gets it for free from SQLCommenter comment capture. - — Rafer Hazen canonicalises Insights' internal
architecture as a
dual-stream
telemetry pipeline: (1) individual-query Kafka topic
carrying per-query messages for the notable tail
(> 1s, > 10k rows, or error) powering Notable queries +
error-occurrences UI, and (2) aggregate-summary Kafka
topic carrying per-query-pattern statistics every 15s
powering the query table, anomalies, and all graphs.
Both feed ClickHouse. Shipped
alongside systems/planetscale-traffic-control — the
release extends tagging from the notable-only stream
to the aggregate stream, enabling
aggregate tag
attribution: three new UI surfaces (tag distribution
per pattern, DB-wide statistics by tag, query-pattern
search via
tag:KEY:VALUE) and three new operational question classes ("what queries has this user run?", "what percentage of runtime came from this controller?", "which jobs execute this pattern?"). The architectural core is two cardinality reducers: per-pattern key-cardinality tracking (> 20 distinct values for a key on a pattern ⇒ collapse to*for 1h, canonicalised as concepts/per-pattern-tag-cardinality) and per-interval tag-combination limit (> 50 unique combinations per pattern per 15s ⇒ greedy highest- cardinality collapse, canonicalised as concepts/per-interval-tag-combination-limit). Both reducers surface collapse status on every aggregate message so the UI shows "percentage of tag values where the value is unknown." Automatic tags added by Insights itself:application_name,username,remote_address— join application-set tags in the same pipeline. Canonicalised as patterns/dynamic-cardinality-reduction-by-tag-collapse.
Source¶
Related¶
- systems/vtgate
- systems/prometheus
- concepts/async-kafka-publication-for-telemetry
- concepts/in-memory-coalescing-by-kafka-key
- concepts/per-hour-per-minute-rollup-tables
- concepts/token-bucket-slow-query-limiter
- concepts/shard-key-database-id
- concepts/kafka-offset-partition-uniqueness-constraint
- concepts/loadable-mysql-function
- concepts/sketch-as-mysql-binary-column
- concepts/dual-write-branch-migration
- concepts/horizontal-sharding
- concepts/postgres-hook
- concepts/plan-cost-to-wallclock-constant-k
- patterns/hybrid-timeseries-store-prometheus-plus-sharded-mysql
- patterns/deterministic-key-hash-for-partition-affinity
- patterns/dual-granularity-rollup-tables
- patterns/dual-write-branch-cutover-via-new-cluster
- patterns/hook-colocation-for-zero-overhead
- patterns/plan-cost-times-k-estimator
- patterns/small-shards-wide-fleet
- systems/planetscale-traffic-control
- systems/planetscale-for-postgres
- systems/planetscale
- systems/postgresql
- systems/mysql
- systems/vitess
- systems/hypopg
- systems/kafka
- systems/clickhouse
- patterns/workload-class-resource-budget
- patterns/llm-plus-planner-validation
- patterns/workload-aware-llm-prompting
- patterns/actor-tagged-query-observability
- patterns/dynamic-cardinality-reduction-by-tag-collapse
- patterns/dual-stream-telemetry-pipeline
- patterns/ast-normalized-query-fingerprinting
- patterns/deploy-marker-overlay-on-time-series
- patterns/index-from-tail-latency-pattern
- concepts/llm-generated-database-changes
- concepts/hypothetical-index-evaluation
- concepts/index-candidate-filtering
- concepts/actor-tagged-error
- concepts/query-tag-filter
- concepts/check-then-act-race
- concepts/aggregate-tag-attribution
- concepts/tag-cardinality-collapse
- concepts/per-pattern-tag-cardinality
- concepts/per-interval-tag-combination-limit
- concepts/rails-query-log-tags
- concepts/activerecord-annotate
- concepts/query-pattern-fingerprint
- concepts/ast-literal-normalization
- concepts/ddsketch-error-bounded-percentile
- concepts/per-pattern-time-series
- concepts/deploy-marker-annotation
- concepts/delete-limit-asymptote
- concepts/query-digest
- concepts/deploy-request
- systems/activerecord-sql_commenter
- systems/ruby-on-rails
- patterns/query-comment-tag-propagation-via-orm
- companies/planetscale