Skip to content

CONCEPT Cited by 1 source

Background hydration

Definition

A cache-warming / replica-fill technique in which a system serves reads against a remote authoritative source while a background process downloads the full data set to local storage. Once local materialisation completes, reads cut over to the local copy — latency drops, round-trips to the remote source stop.

The defining property: reads never block on hydration. Cold-open serves queries immediately; hydration is a latency-optimisation, not a correctness prerequisite.

Etymology — dm-clone and block-level ancestry

The term and the architectural shape come from Linux's dm-clone device-mapper target:

"The dm-clone target allows cloning of arbitrary block devices … while the source device stays read-only … The hydration process runs in the background, cloning the source device's data onto the destination device."

dm-clone serves reads from source-until-cloned, destination-after-cloned, with a per-block hydration bitmap tracking progress. Variations of the idea appear across storage systems under names like lazy replication, demand-paged fetch, copy-on-read warming.

Canonical wiki instance — Litestream VFS

Ben Johnson's 2026-01-29 shipping post:

"To solve this problem, we shoplifted a trick from systems like dm-clone: background hydration. In hydration designs, we serve queries remotely while running a loop to pull the whole database. … Reads don't block on hydration; we serve them from object storage immediately, and switch over to the hydration file when it's ready."

(Source: sources/2026-01-29-flyio-litestream-writable-vfs)

Litestream VFS's specialisation:

  • Source: LTX files in object storage (S3-compatible).
  • Destination: a local SQLite database file at the operator-specified LITESTREAM_HYDRATION_PATH.
  • Hydrator: a background thread reading LTX files and writing the destination using LTX compaction so the destination file contains "only the latest versions of each page" — not the full LTX history.
  • Cutover: once hydration is complete, VFS reads that were previously Range-GETs against object storage transition to local file I/O.
  • Lifetime: the hydration file is a temp file discarded on process exit (see below).

Hydration ≠ persistent cache

A crucial design distinction: the hydration file is not a persistent cache across process restarts:

"Because this is designed for environments like Sprites, which bounce a lot, we write the database to a temporary file. We can't trust that the database is using the latest state every time we start up, not without doing a full restore, so we just chuck the hydration file when we exit the VFS."

Rationale: a remote writer may have advanced the database between this process's previous shutdown and its next startup; the old hydration file is unsafe without a verification pass equivalent in cost to a fresh hydration. Discard-on-exit is the safe default.

This is the opposite choice from a persistent NVMe cache (concepts/read-through-nvme-cache) which can be trusted across restarts because its cache keys are content- addressed against immutable chunks in object storage. Hydration copies specific current page versions into a file that is not content-addressed against those versions — so it ages into staleness as soon as new writes land upstream.

Technique Cold read path Local storage Persists across restarts
Background hydration remote GET full DB copy ❌ discarded
Read-through cache (concepts/read-through-nvme-cache) remote GET per-chunk lazy ✅ (content-addressed)
Restore-before-serve blocks until DB fully local full DB copy
Pure remote-VFS (no local file) remote GET none n/a

Background hydration picks the cold-open-speed of remote-VFS and the steady-state-speed of local-DB without the persist-across-restarts of either. Ideal for ephemeral server substrates (Sprites, FaaS, short-lived sandbox VMs) where cold opens are frequent and process lifetimes bounded; suboptimal for long-lived processes where amortising one litestream restore across hours of uptime costs less than re-hydrating on every start.

Why it exists on Litestream VFS

The 2025-12-11 read-only VFS shipped a remote-read-only surface: "a godsend in a cold start where we have no other alternative besides downloading the whole database, but it's not fast enough for steady state" (quoting the 2026-01-29 post). Background hydration is the answer to the steady-state problem — cold reads still go remote; hot reads eventually land on local disk as the hydrator catches up.

Motivating consumer: the Fly.io Sprite block-map, "low tens of megabytes" of metadata that must be queryable milliseconds after Sprite boot but should not pay S3 round-trips for the rest of the Sprite's lifetime.

Operational parameters not typically disclosed

  • Hydration throughput (MB/s from remote source).
  • Concurrency (how many Range GETs in flight during hydration).
  • Back-pressure against query traffic (does hydration throttle if reads are saturating the network?).
  • Progress reporting surface (can the application observe "N% hydrated"?).
  • Cutover semantics (how mid-query cuts are handled — serializable cutover point, or per-connection flip?).

Seen in

  • sources/2026-01-29-flyio-litestream-writable-vfs — canonical wiki instance. "Shoplifted a trick from systems like dm-clone." LITESTREAM_HYDRATION_PATH=... activates the feature; compaction-driven write of latest page versions; reads served remotely until the file is ready; hydration file discarded on exit.
Last updated · 319 distilled / 1,201 read