CONCEPT Cited by 1 source

On-demand data hydration¶

Definition¶

On-demand data hydration is the practice of lazily fetching data into local cache tiers only when accessed (or predicted to be accessed soon via prefetch), rather than eagerly copying entire datasets to the compute location before use.

Motivation¶

In geo-distributed AI training, the traditional workflow requires researchers to explicitly ingest/snapshot datasets to the target region before starting jobs — a process that can take hours and directly impacts iteration speed. On-demand hydration eliminates this step.

How It Works (Meta's Instance)¶

Data exists in global BLOB-storage (HDD-backed, source of truth)
On first read, data is transparently fetched from global storage → cached in regional flash (L3)
Subsequent reads served from L3 (or L1/L2 if promoted)
Deep prefetch proactively hydrates data likely to be needed in the next few minutes
Eviction policies (TTL, LRU, capacity-aware) manage cache lifecycle

Trade-off vs. Eager Copy¶

	Eager copy	On-demand hydration
First-access latency	Zero (pre-copied)	Higher (cold miss)
Setup time	Hours	Minutes
Storage waste	High (full copies)	Low (only accessed data)
Best for	Large long-running jobs	Smaller iterative research

Meta supports both paradigms in production.

(Source: sources/2026-07-01-meta-ai-storage-blueprint-at-scale)

Seen in¶

sources/2026-07-01-meta-ai-storage-blueprint-at-scale — canonical instance for AI research velocity