CONCEPT Cited by 2 sources
Async clone + background hydration¶
Async clone + background hydration is the
repository-materialisation shape where a clone-equivalent operation
returns as soon as the file tree + refs are present and proceeds
to download file contents concurrently in the background, with
reads on yet-unhydrated files blocking until their blob has arrived.
Introduced to the wiki by Cloudflare's 2026-04-16
ArtifactFS — "git clone but async".
Problem it solves¶
Vanilla git clone is synchronous: the clone command blocks until
every reachable object (all history + all blobs) is on disk. For
small repos that's fine (sub-second), but:
- Multi-GB repos with long history take minutes. Cloudflare cite a 2.4 GB web-framework repo at "close to 2 minutes" clone time.
- Agent / sandbox / CI startup latency is directly gated on clone latency.
--depth=1shallow clones help but discard history agents sometimes want, and still bring down every current-commit blob up front.
Any agent harness that clones on each session pays this cost per session; multiplied across millions of sessions, it becomes a material fleet-level cost.
Mechanism¶
Two pieces collaborate:
- Blobless clone — built on Git's
partial-clone machinery
(
--filter=blob:none). Fetches the file tree (tree objects) and refs, omits blob objects (file contents). File names and paths are present; file contents are not. Clone time dominated by protocol overhead and tree size, not blob volume. - Background-hydration daemon — a lightweight process that, after the blobless clone returns, starts fetching individual blobs in priority order. Reads on not-yet-hydrated files are intercepted by the filesystem and block until that file's blob arrives (on-demand fetch as a fallback, background fetch as the fast path).
Together: agent harness sees a complete-looking directory tree almost immediately; can enumerate files, grep paths, read configs that happen to be already hydrated. The "clone is done" boundary shifts from all blobs local to all tree+refs local.
Priority ordering¶
Hydration order is not arbitrary — it's tuned for the typical opening actions of an agent workload. ArtifactFS's order:
- Package manifests (
package.json,go.mod,pyproject.toml,Cargo.toml, ...). - Configuration files (
.yaml,.toml,.json). - Source code.
- Binaries, images, executables.
The ordering itself is a specialisation — the generalisation is "any FS driver with background-hydration should let the calling workload hint its access pattern so hot files aren't blocked waiting for cold blobs."
Trade-offs¶
| Axis | Synchronous clone | Async clone + hydration |
|---|---|---|
| Startup latency | O(repo size) | O(tree size) + protocol RTT |
| Peak bandwidth | Bursty at start | Spread across hydration window |
| Read latency (cold file) | Always local | First-read may block on fetch |
| Read latency (hot file) | Always local | Likely local (priority-fetched) |
| Offline work | Full repo available | Only-hydrated files readable |
| Sync-back to remote | git push |
git push (same — no FS-level sync) |
Named trade-off from the Cloudflare post: "the filesystem does not attempt to 'sync' files back to the remote repository" — edits are pushed via ordinary Git, not via the FS driver. This is a deliberate simplification.
Not new — but freshly mainstreamed¶
The underlying Git partial-clone machinery is several years old
(Git 2.19+, 2018). What ArtifactFS adds is
packaging it as an FS driver with agent-aware priority and
sandbox startup as the named use case — raising blobless-clone from
a power-user flag to a first-class workload primitive. Similar shape
appears in git-lfs --smudge=delayed, Facebook's EdenFS, Microsoft's
GVFS / VFS-for-Git (discontinued), and various build-farm fetch
accelerators; ArtifactFS is the 2026-era agent-sandbox restatement.
Seen in¶
- sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git — canonical wiki instance via ArtifactFS. Savings claim: ~90–100 s per 2.4 GB repo × 10 k sandbox jobs/month = ~2,778 sandbox hours/month (illustrative, not measured).
- sources/2024-07-30-flyio-making-machines-move — block-level
sibling instance at the Linux device-mapper tier
(
dm-clone). Fly.io's fleet-drain migration for stateful Fly Machines uses the same async-clone-with-background-hydration shape, just on raw block devices rather than Git trees: reads of un-hydrated blocks fall through to the source over iSCSI, writes go to the clone,kcopydrehydrates in background. Cross-tier confirmation that the pattern isn't Git-specific.
Related¶
- systems/artifact-fs — the canonical-instance system.
- systems/cloudflare-artifacts — server-side sibling.
- systems/git — protocol substrate (partial-clone).
- concepts/git-pack-file — the blob-storage substrate being deferred.
- concepts/agent-first-storage-primitive — family of agent-first primitives this belongs to.
- patterns/blobless-clone-lazy-hydrate — the pattern-page treatment.