Skip to content

CONCEPT Cited by 2 sources

Async clone + background hydration

Async clone + background hydration is the repository-materialisation shape where a clone-equivalent operation returns as soon as the file tree + refs are present and proceeds to download file contents concurrently in the background, with reads on yet-unhydrated files blocking until their blob has arrived. Introduced to the wiki by Cloudflare's 2026-04-16 ArtifactFS"git clone but async".

Problem it solves

Vanilla git clone is synchronous: the clone command blocks until every reachable object (all history + all blobs) is on disk. For small repos that's fine (sub-second), but:

  • Multi-GB repos with long history take minutes. Cloudflare cite a 2.4 GB web-framework repo at "close to 2 minutes" clone time.
  • Agent / sandbox / CI startup latency is directly gated on clone latency.
  • --depth=1 shallow clones help but discard history agents sometimes want, and still bring down every current-commit blob up front.

Any agent harness that clones on each session pays this cost per session; multiplied across millions of sessions, it becomes a material fleet-level cost.

Mechanism

Two pieces collaborate:

  1. Blobless clone — built on Git's partial-clone machinery (--filter=blob:none). Fetches the file tree (tree objects) and refs, omits blob objects (file contents). File names and paths are present; file contents are not. Clone time dominated by protocol overhead and tree size, not blob volume.
  2. Background-hydration daemon — a lightweight process that, after the blobless clone returns, starts fetching individual blobs in priority order. Reads on not-yet-hydrated files are intercepted by the filesystem and block until that file's blob arrives (on-demand fetch as a fallback, background fetch as the fast path).

Together: agent harness sees a complete-looking directory tree almost immediately; can enumerate files, grep paths, read configs that happen to be already hydrated. The "clone is done" boundary shifts from all blobs local to all tree+refs local.

Priority ordering

Hydration order is not arbitrary — it's tuned for the typical opening actions of an agent workload. ArtifactFS's order:

  1. Package manifests (package.json, go.mod, pyproject.toml, Cargo.toml, ...).
  2. Configuration files (.yaml, .toml, .json).
  3. Source code.
  4. Binaries, images, executables.

The ordering itself is a specialisation — the generalisation is "any FS driver with background-hydration should let the calling workload hint its access pattern so hot files aren't blocked waiting for cold blobs."

Trade-offs

Axis Synchronous clone Async clone + hydration
Startup latency O(repo size) O(tree size) + protocol RTT
Peak bandwidth Bursty at start Spread across hydration window
Read latency (cold file) Always local First-read may block on fetch
Read latency (hot file) Always local Likely local (priority-fetched)
Offline work Full repo available Only-hydrated files readable
Sync-back to remote git push git push (same — no FS-level sync)

Named trade-off from the Cloudflare post: "the filesystem does not attempt to 'sync' files back to the remote repository" — edits are pushed via ordinary Git, not via the FS driver. This is a deliberate simplification.

Not new — but freshly mainstreamed

The underlying Git partial-clone machinery is several years old (Git 2.19+, 2018). What ArtifactFS adds is packaging it as an FS driver with agent-aware priority and sandbox startup as the named use case — raising blobless-clone from a power-user flag to a first-class workload primitive. Similar shape appears in git-lfs --smudge=delayed, Facebook's EdenFS, Microsoft's GVFS / VFS-for-Git (discontinued), and various build-farm fetch accelerators; ArtifactFS is the 2026-era agent-sandbox restatement.

Seen in

Last updated · 200 distilled / 1,178 read