CONCEPT Cited by 1 source
On-demand data hydration¶
Definition¶
On-demand data hydration is the practice of lazily fetching data into local cache tiers only when accessed (or predicted to be accessed soon via prefetch), rather than eagerly copying entire datasets to the compute location before use.
Motivation¶
In geo-distributed AI training, the traditional workflow requires researchers to explicitly ingest/snapshot datasets to the target region before starting jobs โ a process that can take hours and directly impacts iteration speed. On-demand hydration eliminates this step.
How It Works (Meta's Instance)¶
- Data exists in global BLOB-storage (HDD-backed, source of truth)
- On first read, data is transparently fetched from global storage โ cached in regional flash (L3)
- Subsequent reads served from L3 (or L1/L2 if promoted)
- Deep prefetch proactively hydrates data likely to be needed in the next few minutes
- Eviction policies (TTL, LRU, capacity-aware) manage cache lifecycle
Trade-off vs. Eager Copy¶
| Eager copy | On-demand hydration | |
|---|---|---|
| First-access latency | Zero (pre-copied) | Higher (cold miss) |
| Setup time | Hours | Minutes |
| Storage waste | High (full copies) | Low (only accessed data) |
| Best for | Large long-running jobs | Smaller iterative research |
Meta supports both paradigms in production.
(Source: sources/2026-07-01-meta-ai-storage-blueprint-at-scale)
Seen in¶
- sources/2026-07-01-meta-ai-storage-blueprint-at-scale โ canonical instance for AI research velocity