PATTERN Cited by 1 source
Triangulate RSS growth from metrics¶
Problem¶
RSS is a per-process
metric, not a per-query metric. When Postgres RSS climbs
toward memory-pressure limits, the operator needs to know
which queries are responsible — but EXPLAIN (ANALYZE,
BUFFERS) in isolation shows only I/O and runtime, not
process-memory footprint, and
Query Insights does not expose per-query memory.
Simeon Griggs (PlanetScale, 2026-03-30) names the gap explicitly:
"Query Insights helps you investigate query performance through CPU time, I/O, and latency, but it does not show per-query memory. You may see OOM markers and slow-query signals, but not query-specific RSS usage. RSS is a per-process metric, not a per-query metric. That means you cannot read 'RSS per query' directly from
EXPLAINor Query Insights."
(Source: sources/2026-04-21-planetscale-high-memory-usage-in-postgres-is-good-actually.)
Pattern¶
Four-signal triangulation workflow:
1. Cluster Metrics
→ identify the time window when RSS rose
↓
2. Query Insights (same window)
→ find expensive query patterns (high runtime / CPU / I/O / rows-read)
→ find OOM-adjacent activity (error markers, connection resets)
↓
3. EXPLAIN (ANALYZE, BUFFERS, MEMORY) on suspects
→ operator-level memory attribution at the plan-node level
↓
4. Connection count (same window)
→ rule out high-concurrency RSS contribution
(many moderate-memory backends > few memory-hungry queries)
The four signals constrain the suspect space from different angles:
- Cluster Metrics gives the when.
- Query Insights gives the what (candidate query set).
EXPLAINoptions gives the per-node cost.- Connection count gives the concurrency multiplier.
None of them alone is sufficient; together they typically isolate the root cause.
Worked example¶
RSS rose from 4 GB to 18 GB over 10 minutes, and OOM kill events appeared at the peak.
- Cluster Metrics: narrow to the 10-minute window.
- Query Insights for that window: one analytics query
pattern —
SELECT ... ORDER BY date DESC, id DESC LIMIT Nwith a largeN— appears with 50+ concurrent executions, each running for 30-60 seconds. EXPLAIN (ANALYZE, BUFFERS, MEMORY): the query produces a large on-disk sort because the combined index order doesn't match the ORDER BY; the sort node has a 150 MB work-memory footprint.- Connection count: 50 concurrent backends × 150 MB sort = 7.5 GB of transient RSS just from this one query shape — consistent with the observed 14 GB RSS jump.
Root cause: work_mem
multiplication across concurrent executions of an analytics
query without an index matching its ORDER BY.
Remediation: - Short-term: throttle the analytics workload in Traffic Control. - Long-term: add a matching index; or refactor query to use index-order pagination.
Variations¶
pg_stat_activitysnapshot during the rise window to see per-backend query text — coarser than Query Insights but available without extension install.top/pson the DB host for per-backend RSS — only accessible to operators with host-level access (not customers on managed services).- Cumulative
pg_stat_statementsdiff across the window to find queries with the biggest change in execution count.
Relationship to other memory-attribution patterns¶
- patterns/read-the-code-for-partial-failure-bugs — when telemetry is insufficient, drop to source reading. For Postgres memory, source-reading means reading the extension or application code that issues the suspect queries.
- patterns/diagnose-via-heap-dump-lock-introspection — same shape (multi-signal triangulation) applied to lock contention rather than memory.
Seen in¶
- sources/2026-04-21-planetscale-high-memory-usage-in-postgres-is-good-actually — Simeon Griggs canonicalises the four-step workflow (Cluster Metrics → Query Insights →
EXPLAIN (ANALYZE, BUFFERS, MEMORY)→ connection count) as the prescribed response to RSS growth when per-query memory is not directly observable.