PATTERN Cited by 1 source

Triangulate RSS growth from metrics¶

Problem¶

RSS is a per-process metric, not a per-query metric. When Postgres RSS climbs toward memory-pressure limits, the operator needs to know which queries are responsible — but EXPLAIN (ANALYZE, BUFFERS) in isolation shows only I/O and runtime, not process-memory footprint, and Query Insights does not expose per-query memory.

Simeon Griggs (PlanetScale, 2026-03-30) names the gap explicitly:

"Query Insights helps you investigate query performance through CPU time, I/O, and latency, but it does not show per-query memory. You may see OOM markers and slow-query signals, but not query-specific RSS usage. RSS is a per-process metric, not a per-query metric. That means you cannot read 'RSS per query' directly from EXPLAIN or Query Insights."

(Source: sources/2026-04-21-planetscale-high-memory-usage-in-postgres-is-good-actually.)

Pattern¶

Four-signal triangulation workflow:

1. Cluster Metrics
   → identify the time window when RSS rose
        ↓
2. Query Insights (same window)
   → find expensive query patterns (high runtime / CPU / I/O / rows-read)
   → find OOM-adjacent activity (error markers, connection resets)
        ↓
3. EXPLAIN (ANALYZE, BUFFERS, MEMORY) on suspects
   → operator-level memory attribution at the plan-node level
        ↓
4. Connection count (same window)
   → rule out high-concurrency RSS contribution
     (many moderate-memory backends > few memory-hungry queries)

The four signals constrain the suspect space from different angles:

Cluster Metrics gives the when.
Query Insights gives the what (candidate query set).
EXPLAIN options gives the per-node cost.
Connection count gives the concurrency multiplier.

None of them alone is sufficient; together they typically isolate the root cause.

Worked example¶

RSS rose from 4 GB to 18 GB over 10 minutes, and OOM kill events appeared at the peak.

Cluster Metrics: narrow to the 10-minute window.
Query Insights for that window: one analytics query pattern — SELECT ... ORDER BY date DESC, id DESC LIMIT N with a large N — appears with 50+ concurrent executions, each running for 30-60 seconds.
EXPLAIN (ANALYZE, BUFFERS, MEMORY): the query produces a large on-disk sort because the combined index order doesn't match the ORDER BY; the sort node has a 150 MB work-memory footprint.
Connection count: 50 concurrent backends × 150 MB sort = 7.5 GB of transient RSS just from this one query shape — consistent with the observed 14 GB RSS jump.

Root cause: work_mem multiplication across concurrent executions of an analytics query without an index matching its ORDER BY.

Remediation: - Short-term: throttle the analytics workload in Traffic Control. - Long-term: add a matching index; or refactor query to use index-order pagination.

Variations¶

pg_stat_activity snapshot during the rise window to see per-backend query text — coarser than Query Insights but available without extension install.
top / ps on the DB host for per-backend RSS — only accessible to operators with host-level access (not customers on managed services).
Cumulative pg_stat_statements diff across the window to find queries with the biggest change in execution count.

Relationship to other memory-attribution patterns¶

patterns/read-the-code-for-partial-failure-bugs — when telemetry is insufficient, drop to source reading. For Postgres memory, source-reading means reading the extension or application code that issues the suspect queries.
patterns/diagnose-via-heap-dump-lock-introspection — same shape (multi-signal triangulation) applied to lock contention rather than memory.

Seen in¶

sources/2026-04-21-planetscale-high-memory-usage-in-postgres-is-good-actually — Simeon Griggs canonicalises the four-step workflow (Cluster Metrics → Query Insights → EXPLAIN (ANALYZE, BUFFERS, MEMORY) → connection count) as the prescribed response to RSS growth when per-query memory is not directly observable.