PlanetScale — High memory usage in Postgres is good, actually¶
Summary¶
Simeon Griggs (PlanetScale, 2026-03-30) opens with a houseplant
analogy: the dashboard showing memory at 80% looks alarming the
way a houseplant looks thirsty, but most operators project human
discomfort onto the wrong signal. Sustained high CPU is always
bad (work that cannot be skipped, queue depths grow, tail
latency rises); sustained high memory is usually good (the
system is keeping hot data close to the CPU). The structural
reason is that the single "memory used" percentage conflates two
fundamentally different populations: reclaimable cache
(active cache, inactive cache, memory-mapped pages — dropped by
the OS the moment something else needs RAM) and non-reclaimable
process memory (RSS — stack, heap, catalog/relcache, sort +
hash-table work_mem allocations,
prepared-plan session memory). Only the second drives OOM risk.
A healthy Postgres host should have most
of its RAM consumed by
shared_buffers + OS page cache;
a memory number that's high because RSS is climbing is
memory pressure, a different phenomenon with a different
investigation playbook.
Key takeaways¶
- Memory is workspace, CPU is work. The 0–100% scale conceals the asymmetry: sustained high CPU means queries arrive faster than they can be processed (queueing, timeouts, cascading failures); sustained high memory usually means the system has amortised its disk reads into RAM. "Unlike sustained high CPU, high memory usage by itself does not mean performance is degraded."
- Reading from RAM is ~1000× faster than reading from an NVMe drive. This is the entire argument for why Postgres aggressively consumes RAM: "reading a page from RAM is roughly 1,000 times faster than reading it from even a fast NVMe drive. A database that keeps frequently accessed data in memory avoids that penalty on every query." PlanetScale Metal's locally-attached NVMe is fast, but still not as fast as cache. (See concepts/storage-latency-hierarchy.)
- Two caching layers coexist by design.
shared_buffersholds 8 KB Postgres pages keyed by relation+block; OS page cache holds raw filesystem pages. "Postgres was designed with this in mind, and its own documentation notes that the operating system's cache is expected to handle data beyond what fits inshared_buffers." The post explicitly discourages tuningshared_buffersas a first troubleshooting step. - The dashboard number conflates four sub-categories. PlanetScale's Cluster Metrics breaks memory into active cache (OS recently touched, wants to keep), inactive cache (not accessed lately), memory-mapped (cached pages backed by real on-disk files), and RSS (Resident Set Size — Postgres process private memory). All three cache sub-categories are reclaimable by the OS. Only RSS is not.
- RSS is what drives OOM risk. Canonical framing: "If total memory is high because cache is high, good! Frequently accessed data stays near the CPU for faster access. … If total memory is high because RSS is high, that is referred to as memory pressure and is a problem." See concepts/memory-pressure.
- RSS grows for enumerated reasons, not just query volume.
Canonical list (new concepts/postgres-rss-growth-drivers
concept): (1) multiple
work_memallocations within a single query (per sort / hash-node, per-query, per-backend — multiplicative); (2) catalog bloat especially in multi-tenant schemas using a table-per-tenant pattern; (3) OS allocator inefficiency (memory returned to heap but not to OS); (4) misbehaving or misconfigured extensions; (5) cached plans and prepared statements accumulating per-session until deallocation. - Tuning
work_memis not a safe default knob. "A single complex query can allocatework_memmultiple times, and that multiplies across every active connection. Setting it too low forces more disk I/O; setting it too high globally can cause total memory usage to spike unpredictably under load. Neither direction is a safe default change without first understanding your workload's concurrency and query complexity." - Connection pooling is the most effective RSS lever in process-per-connection architectures. "Efficient connection pooling can be the best way to reduce RSS usage. Fewer active connections result in fewer copies of all that per-process overhead." PlanetScale ships PgBouncer in transaction mode — connections returned to the pool at transaction boundary. Canonical new patterns/connection-pooling-to-reduce-rss pattern.
- RSS is per-process, not per-query — so you can't ask
EXPLAINfor "RSS of this query." Canonical debugging playbook (new patterns/triangulate-rss-to-query pattern): (1) Cluster Metrics to identify when RSS rises; (2) Query Insights in that window for expensive patterns (high runtime / CPU / I/O / rows/blocks read) and OOM-adjacent activity; (3) re-run suspect queries withEXPLAIN (ANALYZE, BUFFERS, MEMORY)for operator-level memory — the concepts/explain-analyze-buffers-memory primitive; (4) check concurrent connection counts in the same window.
Systems / concepts / patterns extracted¶
Systems¶
- PostgreSQL — the substrate; the post's entire framing is Postgres-process-model-specific. Process-per-connection architecture is load-bearing for the RSS-grows-with-connections argument.
- PlanetScale Postgres — the vendor surface where the dashboard metric appears; the breakdown into active cache / inactive cache / memory-mapped / RSS is PlanetScale's Cluster Metrics taxonomy.
- PlanetScale Metal — named for locally-attached NVMe; the 1000× RAM-vs-NVMe gap argument is the reason Postgres uses cache even on the fastest storage tier.
- PlanetScale Insights — Query Insights + Cluster Metrics named as the investigation surface; OOM markers are surfaced here.
- PgBouncer — PlanetScale's in-proxy connection pooler in transaction mode named as the primary RSS reduction lever.
- NVMe — named as the storage tier against which RAM is ~1000× faster.
Concepts¶
- concepts/reclaimable-vs-non-reclaimable-memory — NEW canonical wiki concept. The structural distinction the post is built around: active cache / inactive cache / memory-mapped are reclaimable by the OS; RSS is not. A dashboard number doesn't tell you which without drill-down.
- concepts/memory-pressure — NEW. Rising RSS toward limits, OOM kills, unexplained restarts, and tail latency spiking together with heavy disk I/O are the canonical signals. Distinguished from "high memory usage" as a diagnostic category.
- concepts/postgres-rss-growth-drivers — NEW. Canonical five-item enumeration (work_mem × backends, catalog bloat, allocator, extensions, cached plans).
- concepts/postgres-work-mem — NEW. Per sort/hash-node, per-query, per-backend — multiplicative memory knob that is not a safe default tune.
- concepts/explain-analyze-buffers-memory — NEW. The Postgres-specific planner invocation that gives operator-level memory; the bridge from cluster metrics to per-query investigation.
- concepts/linux-page-cache — the lower tier of the double-buffered stack; canonicalises active / inactive / memory-mapped as three reclaimable sub-categories of the page cache.
- concepts/postgres-shared-buffers-double-buffering — canonical prior wiki coverage; this post re-frames it under the reclaimability / OOM-risk axis.
- concepts/connection-pool-exhaustion — the post's "fewer active connections" argument connects to existing connection-pool wiki coverage at the RSS-reduction altitude.
- concepts/cpu-utilization-vs-saturation — CPU framed as "work" vs memory as "workspace" is the canonical utilisation-vs-saturation framing at the sustained-load axis.
- concepts/storage-latency-hierarchy — the 1000× RAM vs NVMe gap is the numeric grounding.
Patterns¶
- patterns/connection-pooling-to-reduce-rss — NEW. In process-per-connection database architectures, each backend carries baseline RSS (stack, heap, catalog/relcache, cached plans); reducing the number of concurrent backends via a pooler is the most structurally effective RSS lever. Transaction-mode PgBouncer is the canonical production implementation.
- patterns/triangulate-rss-to-query — NEW. Four-
step investigation pipeline for per-process metrics that
aren't directly attributable to queries: cluster metrics
identify the window → query insights finds candidates →
EXPLAIN (ANALYZE, BUFFERS, MEMORY)quantifies operator memory → connection counts disambiguate fan-out.
Operational numbers / architectural data¶
- ~1000× latency gap between RAM access and NVMe access — the quantitative foundation for caching at all. "Reading a page from RAM is roughly 1,000 times faster than reading it from even a fast NVMe drive."
- 8 KB default Postgres page size (the fixed chunk of
table/index data
shared_buffersworks with). - Four memory sub-categories in PlanetScale's Cluster Metrics: active cache, inactive cache, memory-mapped, RSS.
- Three reclaimable (all cache types); one non- reclaimable (RSS).
- Process-per-connection Postgres architecture — every connection spawns a backend process with its own baseline RSS footprint.
- Transaction mode PgBouncer — connections returned to the pool after each transaction; the default pooling mode on PlanetScale.
Caveats¶
- Pedagogical / dashboard-explainer voice — no PlanetScale production retrospective, no customer war story, no numeric case (no "customer at X GB RSS hit OOM at Y connections"). Structure is dashboard-screenshot + narrative.
- 1000× RAM-vs-NVMe figure is a standard order-of- magnitude claim, not a PlanetScale measurement. Actual gap varies by workload (random vs sequential, queue depth, cache-line-size effects).
- RSS growth drivers enumerated but not quantified. Which driver dominates at PlanetScale scale is not disclosed. No distribution across customers, no "X% of OOM incidents trace to catalog bloat."
work_memtuning guidance is qualitative. The post declines to provide a concrete rule because "neither direction is a safe default change" — but this means operators are left to tune empirically without a starting point.- The
EXPLAIN (ANALYZE, BUFFERS, MEMORY)option is named but not worked through. A reader who doesn't already knowEXPLAINoutput won't know how to interpret the memory column. - Catalog bloat from multi-tenant schemas is flagged but not operationalised (no threshold, no measurement, no mitigation strategy beyond the parenthetical).
- Tail-latency + heavy-disk-I/O pairing is named as a memory-pressure signal but the causal chain (working set doesn't fit → more misses → more disk → tail latency) is implicit; readers expected to connect the dots.
- Extension-misbehaviour is named as a cause but not instrumented (how would an operator detect a leaking extension?).
- No cross-engine comparison. MySQL InnoDB's single-layer
buffer pool (no OS page cache reliance in
O_DIRECTconfigurations) behaves differently; the post is Postgres-specific.
Source¶
- Original: https://planetscale.com/blog/high-memory-usage-in-postgres-is-good-actually
- Raw markdown:
raw/planetscale/2026-04-21-high-memory-usage-in-postgres-is-good-actually-1ad7d2ab.md
Related¶
- systems/postgresql
- systems/planetscale-for-postgres
- systems/planetscale-metal
- systems/planetscale-insights
- systems/pgbouncer
- concepts/reclaimable-vs-non-reclaimable-memory
- concepts/memory-pressure
- concepts/postgres-rss-growth-drivers
- concepts/postgres-work-mem
- concepts/explain-analyze-buffers-memory
- concepts/linux-page-cache
- concepts/postgres-shared-buffers-double-buffering
- concepts/connection-pool-exhaustion
- concepts/storage-latency-hierarchy
- concepts/cpu-utilization-vs-saturation
- patterns/connection-pooling-to-reduce-rss
- patterns/triangulate-rss-to-query
- companies/planetscale