Skip to content

CONCEPT Cited by 1 source

NUMA awareness

Definition

NUMA (Non-Uniform Memory Access) is the dominant memory architecture on multi-socket x86 servers: each CPU socket has its own memory controller + attached DRAM. A core accessing local DRAM sees normal latency; a core accessing DRAM on a remote socket sees 1.5–3× higher latency and lower bandwidth — plus cache-coherence traffic across the socket interconnect.

NUMA-aware process design pins a process (and its memory allocations) to a single NUMA node, so all memory accesses are local and latency is predictable.

Canonical wiki citation

From systems/azure-cosmos-db (disclosed in the High Scalability Dec-2022 roundup):

"We never span our process across NUMA nodes. When a process is across the NUMA node, memory access latency can increase if the cache misses. Not crossing the NUMA node gives our process a more predictable performance."

Why it matters for database engines

Most hyperscaler OLTP engines (Cosmos DB, FoundationDB, Honeycomb's storage layer) are latency-critical in the hot path — a cache miss to a remote NUMA node converts a nanosecond-scale operation into a hundreds-of-nanoseconds one. Multiplied by the hot-path call rate, this is the difference between a database that fits on one box and one that needs horizontal sharding.

The engine-level design convention that emerges:

  • One process per NUMA node, or
  • Pin each worker thread + its allocated memory to a single NUMA node, or
  • Use custom allocators that respect NUMA boundaries (not glibc malloc, which does not by default).

Contrast with generic-tier workloads

Most application-tier workloads (web servers, stateless services) don't benefit from NUMA pinning because their hot working set is small enough to fit in L1/L2. NUMA awareness is specifically about latency-critical hot paths with working sets that exceed L3 — i.e. databases, in-memory caches, search engines, real-time analytics.

Seen in

Last updated · 319 distilled / 1,201 read