Skip to content

CONCEPT Cited by 1 source

On-demand data hydration

Definition

On-demand data hydration is the practice of lazily fetching data into local cache tiers only when accessed (or predicted to be accessed soon via prefetch), rather than eagerly copying entire datasets to the compute location before use.

Motivation

In geo-distributed AI training, the traditional workflow requires researchers to explicitly ingest/snapshot datasets to the target region before starting jobs โ€” a process that can take hours and directly impacts iteration speed. On-demand hydration eliminates this step.

How It Works (Meta's Instance)

  1. Data exists in global BLOB-storage (HDD-backed, source of truth)
  2. On first read, data is transparently fetched from global storage โ†’ cached in regional flash (L3)
  3. Subsequent reads served from L3 (or L1/L2 if promoted)
  4. Deep prefetch proactively hydrates data likely to be needed in the next few minutes
  5. Eviction policies (TTL, LRU, capacity-aware) manage cache lifecycle

Trade-off vs. Eager Copy

Eager copy On-demand hydration
First-access latency Zero (pre-copied) Higher (cold miss)
Setup time Hours Minutes
Storage waste High (full copies) Low (only accessed data)
Best for Large long-running jobs Smaller iterative research

Meta supports both paradigms in production.

(Source: sources/2026-07-01-meta-ai-storage-blueprint-at-scale)

Seen in

Last updated ยท 567 distilled / 1,685 read