CONCEPT Cited by 1 source

Transparent memory offloading¶

Definition¶

A kernel- or hypervisor-level technique that detects cold memory pages in process address spaces and migrates them to a cheaper, lower-power storage tier (typically an NVMe-attached SSD), freeing DRAM for hot working sets — all transparent to the application. The application continues to see a unified address space; accesses to offloaded pages fault them back in through swap-like machinery, but with page-level rather than whole-process granularity.

Meta TMO: 20–32% DRAM savings across millions of servers¶

From Meta's 2022-06-20 engineering post (via sources/2022-07-11-highscalability-stuff-the-internet-says-on-scalability-for-july-11th-2022):

"Transparent memory offloading (TMO) saves 20 percent to 32 percent of memory per server across millions of servers in our data center fleet."

Motivation:

"alternative technologies such as NVMe-connected solid state drives (SSDs) offer higher capacity than DRAM at a fraction of the cost and power. Transparently offloading colder memory to such cheaper memory technologies via kernel or hypervisor techniques offers a promising approach to curb the appetite for DRAM."

Applied at fleet scale, the savings compound: Intel's estimate (via @adrianco) — "the CPU still represents about 32 percent of the IT equipment power budget, memory only burning 14 percent, peripheral costs around 20 percent, the motherboard around 10 percent, and disk drives 5 percent" — frames DRAM as the second-biggest power line item, so a 20–32% reduction is material to the overall energy bill.

Parallel: LinkedIn RoCE for datacenter block storage¶

LinkedIn reports using RDMA over Converged Ethernet (RoCE) to expose remote DRAM/SSD at near-local latency, achieving a parallel effect — disaggregated memory/storage accessed from remote hosts rather than locally-offloaded pages:

"how we solved this challenge by providing a flexible, scalable, performant and reliable block storage service using Remote Direct Memory Access (RDMA)."

(LinkedIn Engineering, Scaling storage in the datacenter with RoCE.)

General pattern¶

DRAM is not the only latency tier worth treating as memory; tiered storage with page-level transparent migration recovers a sizable fraction of the most expensive capacity.

Emerging hardware support makes this pattern increasingly mainstream: CXL.mem (Compute Express Link), persistent memory, and higher-bandwidth NVMe all extend the "treat-as-memory" tier below DRAM.

Caveats¶

Requires cold-page detection heuristics that don't churn; an oversensitive offloader degrades P99 latency by page-faulting back hot pages.
Works best on workloads with clearly-segmented hot/cold footprints (web-tier caches, analytics workers with large stale state). Latency-sensitive trading / gaming loops won't tolerate the page-fault tail.

systems/kubernetes — consumer of fleet-wide memory-savings techniques.
sources/2022-07-11-highscalability-stuff-the-internet-says-on-scalability-for-july-11th-2022.
companies/highscalability.