Skip to content

CONCEPT Cited by 1 source

Allocation limit (per-thread memory cap)

Definition

An allocation limit is a runtime-enforced cap on how much memory a single logical thread of execution (request, task, fiber) is allowed to allocate before the runtime safely terminates that thread. Other threads on the same process are unaffected; resources the terminated thread had open (sockets, files, held locks) are released via the runtime's exception / cleanup machinery.

It is a cooperative-scheduler-level resource cap, distinct from:

  • OS-level RLIMITs — process-level, kill-whole-process on breach.
  • Cgroups — container-level, also typically process-scoped.
  • GC heap limits — whole-heap, not per-thread.
  • Application rate limits — per-client, not per-request-memory.

Canonical wiki instance: GHC at Meta Sigma

Meta added allocation limits to GHC for its Sigma anti-abuse rule engine (sources/2015-06-26-meta-fighting-spam-with-haskell). The motivation is explicit:

In a latency-sensitive service, you don't want a single request using a lot of resources and slowing down other requests on the same machine. ... A request that uses a lot of resources is normally a bug that we want to fix. ... When this happens, we want Sigma to terminate the affected request with an error (that will subsequently result in the bug being fixed) and continue without any impact on the performance of other requests being served.

Mechanism

  • Per-thread allocation counter. The GHC runtime tracks how much memory each Haskell thread has allocated.
  • Cap checked during allocation. When a thread's allocation crosses its cap, the runtime throws an asynchronous exception to that thread (GHC: asynchronous exceptions).
  • Resources cleaned up via exception handlers. Most Haskell code (IO actions wrapped in bracket, finally, etc.) releases resources correctly under asynchronous exceptions, so network connections are closed, file handles are released, etc.
  • Other threads unaffected. The termination is thread-local.

Operational evidence (Meta)

Meta's post includes a graph (image in the original post) showing "the maximum live memory across various groups of machines in the Sigma fleet" — large spikes appear when a request type with resource-intensive outliers is enabled; the spikes disappear after allocation limits are enabled on that request type. The limit does not fix the underlying bug, but it prevents a single pathological request from affecting its neighbours on the same machine.

Design implications

  • Requires a cooperative runtime. The mechanism is fundamentally based on the runtime's ability to (a) track allocation per thread and (b) interrupt a thread safely. A non-managed runtime (pure C, unrestricted C++) cannot do this without instrumentation.
  • Requires that aborting a request is safe. Code that holds cross-thread state, long-held locks, or persistent-state mutation must handle asynchronous exceptions correctly, or the abort can corrupt shared state. Meta's "we're careful to ensure that we don't change any code associated with persistent state in Sigma" fits this requirement.
  • Bug-discovery tool, not a reliability fix. Meta is explicit: "a request that uses a lot of resources is normally a bug that we want to fix." The limit is a blast-radius containment for the bug, not a substitute for fixing it.
  • JVM: no standard per-thread allocation cap. Closest is heap-size limits (process-scoped) + ThreadMXBean allocation telemetry (observation only, not enforcement).
  • Go: no per-goroutine allocation cap. debug.SetMemoryLimit is process-scoped.
  • Erlang / BEAM: per-process heap size can be bounded; BEAM's per-process isolation is a close conceptual match, rooted in the same cooperative-scheduler + lightweight-process design as Haskell.

When to use

  • Multi-tenant request-serving on a managed runtime where one pathological request should not affect peers on the same process.
  • Rule engines, query engines, and evaluators where input size / shape is adversarial or unpredictable.
  • Pairs naturally with concepts/blast-radius thinking: the process is the blast radius for OS-level RLIMITs; the thread is the blast radius for allocation limits.

Seen in

Last updated · 319 distilled / 1,201 read