CONCEPT Cited by 3 sources

Async clone + background hydration¶

Async clone + background hydration is the repository-materialisation shape where a clone-equivalent operation returns as soon as the file tree + refs are present and proceeds to download file contents concurrently in the background, with reads on yet-unhydrated files blocking until their blob has arrived. Introduced to the wiki by Cloudflare's 2026-04-16 ArtifactFS — "git clone but async".

Problem it solves¶

Vanilla git clone is synchronous: the clone command blocks until every reachable object (all history + all blobs) is on disk. For small repos that's fine (sub-second), but:

Multi-GB repos with long history take minutes. Cloudflare cite a 2.4 GB web-framework repo at "close to 2 minutes" clone time.
Agent / sandbox / CI startup latency is directly gated on clone latency.
--depth=1 shallow clones help but discard history agents sometimes want, and still bring down every current-commit blob up front.

Any agent harness that clones on each session pays this cost per session; multiplied across millions of sessions, it becomes a material fleet-level cost.

Mechanism¶

Two pieces collaborate:

Blobless clone — built on Git's partial-clone machinery (--filter=blob:none). Fetches the file tree (tree objects) and refs, omits blob objects (file contents). File names and paths are present; file contents are not. Clone time dominated by protocol overhead and tree size, not blob volume.
Background-hydration daemon — a lightweight process that, after the blobless clone returns, starts fetching individual blobs in priority order. Reads on not-yet-hydrated files are intercepted by the filesystem and block until that file's blob arrives (on-demand fetch as a fallback, background fetch as the fast path).

Together: agent harness sees a complete-looking directory tree almost immediately; can enumerate files, grep paths, read configs that happen to be already hydrated. The "clone is done" boundary shifts from all blobs local to all tree+refs local.

Priority ordering¶

Hydration order is not arbitrary — it's tuned for the typical opening actions of an agent workload. ArtifactFS's order:

Package manifests (package.json, go.mod, pyproject.toml, Cargo.toml, ...).
Configuration files (.yaml, .toml, .json).
Source code.
Binaries, images, executables.

The ordering itself is a specialisation — the generalisation is "any FS driver with background-hydration should let the calling workload hint its access pattern so hot files aren't blocked waiting for cold blobs."

Trade-offs¶

Axis	Synchronous clone	Async clone + hydration
Startup latency	O(repo size)	O(tree size) + protocol RTT
Peak bandwidth	Bursty at start	Spread across hydration window
Read latency (cold file)	Always local	First-read may block on fetch
Read latency (hot file)	Always local	Likely local (priority-fetched)
Offline work	Full repo available	Only-hydrated files readable
Sync-back to remote	`git push`	`git push` (same — no FS-level sync)

Named trade-off from the Cloudflare post: "the filesystem does not attempt to 'sync' files back to the remote repository" — edits are pushed via ordinary Git, not via the FS driver. This is a deliberate simplification.

Not new — but freshly mainstreamed¶

The underlying Git partial-clone machinery is several years old (Git 2.19+, 2018). What ArtifactFS adds is packaging it as an FS driver with agent-aware priority and sandbox startup as the named use case — raising blobless-clone from a power-user flag to a first-class workload primitive. Similar shape appears in git-lfs --smudge=delayed, Facebook's EdenFS, Microsoft's GVFS / VFS-for-Git (discontinued), and various build-farm fetch accelerators; ArtifactFS is the 2026-era agent-sandbox restatement.

Seen in¶

sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git — canonical wiki instance via ArtifactFS. Savings claim: ~90–100 s per 2.4 GB repo × 10 k sandbox jobs/month = ~2,778 sandbox hours/month (illustrative, not measured).
sources/2024-07-30-flyio-making-machines-move — block-level sibling instance at the Linux device-mapper tier (dm-clone). Fly.io's fleet-drain migration for stateful Fly Machines uses the same async-clone-with-background-hydration shape, just on raw block devices rather than Git trees: reads of un-hydrated blocks fall through to the source over iSCSI, writes go to the clone, kcopyd rehydrates in background. Cross-tier confirmation that the pattern isn't Git-specific.
sources/2026-02-04-flyio-litestream-writable-vfs — SQLite-database-level instance via Litestream VFS hydration mode. Ben Johnson explicitly credits dm-clone as the ancestor — "we shoplifted a trick from systems like dm-clone: background hydration." The VFS serves reads from object storage while a background loop pulls the whole database to a local temp file (via LTX compaction, writing only the latest version of each page); the read path switches over when hydration completes; the file is discarded on VFS exit. Canonical wiki instance of hydration applied at SQLite-database granularity (previous instances were block-level via dm-clone and Git-tree-level via ArtifactFS). Production consumer: the Fly Sprites "block map" (JuiceFS metadata tier on SQLite + Litestream VFS).

systems/artifact-fs — the canonical-instance system.
systems/cloudflare-artifacts — server-side sibling.
systems/git — protocol substrate (partial-clone).
concepts/git-pack-file — the blob-storage substrate being deferred.
concepts/agent-first-storage-primitive — family of agent-first primitives this belongs to.
patterns/blobless-clone-lazy-hydrate — the pattern-page treatment.
systems/litestream-vfs — 2026-02-04 SQLite-database-level hydration, dm-clone-lineage.
systems/fly-sprites — production consumer of the SQLite-database-level variant (Sprite "block map").
concepts/sqlite-vfs — the interception surface the database-level variant uses.
patterns/vfs-range-get-from-object-store — the read-side shape the database-level variant composes with.