Skip to content

CONCEPT Cited by 1 source

Memory overcommit risk

Definition

Memory overcommit risk is the failure mode where a standalone database's configured max_connections exceeds the worst-case aggregate memory demand across all simultaneously active connections. Average utilisation looks healthy, but a correlated burst — every connection lands on a memory-heavy query at the same time — exceeds physical memory and the database process is OOM-killed or crashes.

The risk is structural, not incidental: it arises from operators raising max_connections past the memory envelope under the intuition that "we're not using that much memory yet, so raising it is harmless."

Why it happens

MySQL / Postgres / most relational databases allocate per- connection memory on-demand during query execution:

  • Session sort / join / read buffers grow with query shape.
  • Thread stacks are fixed but multiply with connection count.
  • Session-level statement state (prepared statements, open cursors, user variables) grows over the session's lifetime.
  • Large sorts / hashes can temporarily balloon per-query memory (MySQL sort_buffer_size, join_buffer_size, tmp_table_size knobs cap each individual allocation but don't cap the sum across connections).

If max_connections × typical_per_connection_memory > physical_memory, the database runs fine during normal load (most connections are idle; active ones touch only a fraction of their potential allocation), but a workload shift — a spike of identical large analytical queries, a reporting burst, a ramp from a cached-but-cold restart — can land every connection on a worst-case allocation path simultaneously.

Canonical framing

From Liz van Dijk (PlanetScale, 2022-11-01):

"While it may seem harmless to raise this variable at first (you may not be approaching the instance memory limits quite yet), making MySQL live outside its means (i.e. overcommitting memory) opens the door to dangerous crashes and potential downtime, so this is not recommended." (Source: sources/2026-04-21-planetscale-one-million-connections.)

The "you may not be approaching the instance memory limits quite yet" clause is load-bearing: the operator who raises the knob sees healthy utilisation today and infers headroom. The failure is future-bursts-not-yet-seen, which no amount of today-metrics inspection will warn about.

Why it's a crash, not a slowdown

Unlike CPU or I/O contention — which usually manifest as elevated latency before producing failure — memory overcommit produces a step-function failure:

  1. Memory budget is healthy.
  2. A burst arrives; every connection allocates its worst-case per-query footprint.
  3. Physical memory is exceeded.
  4. Kernel OOM-killer terminates the MySQL process (or swap thrashing makes it effectively unresponsive).
  5. The database crashes; every in-flight connection is lost; application sees a connection reset.

No gradual backpressure mechanism intervenes — the database doesn't get "slow" first, it just crashes. This makes overcommit distinctly nastier than typical backpressure / load-shedding failure modes.

The operational antidote

Two defences:

  1. Don't raise max_connections past the memory envelope — treat the default as a calculated trade-off, not an artificial limit. Raising it should be accompanied by adding memory (vertical scale) or capping per-session allocations.

  2. Put a proxy tier in front that caps application-facing concurrency below the database's memory-safe ceiling. The proxy absorbs spikes and queues / rejects excess requests; the database sees a bounded, predictable connection count. Canonical proxy-tier implementations: Vitess VTTablet, PlanetScale Global Routing Infrastructure, pgbouncer, ProxySQL. See patterns/two-tier-connection-pooling.

Contrast with pool exhaustion

  • Pool exhaustion is the desired failure mode: the pool is full, new requesters wait or get rejected, the database stays healthy. This is graceful degradation.
  • Memory overcommit is the undesired failure mode: the pool is full, the database has accepted all of them, a burst lands, the database crashes. This is catastrophic failure.

The max_connections ceiling is the boundary between these two regimes.

Seen in

Last updated · 378 distilled / 1,213 read