Skip to content

Artifacts: versioned storage that speaks Git

Summary

Cloudflare's 2026-04-16 post launches Artifacts (private beta, public beta by early May 2026) — a distributed versioned filesystem, built for agents, that speaks the Git wire protocol. Repositories are created programmatically via a REST API or native Workers binding (env.AGENT_REPOS.create(name)), then cloned/fetched/pushed with any regular Git client over an authenticated HTTPS remote URL. Each Artifacts repo is backed by a Durable Object (one DO instance per repo — same (storage + compute) locality premise as Project Think's agent-actor substrate), with large objects chunked into the DO's embedded SQLite, pack-file snapshots persisted to R2, and auth tokens tracked in KV. The Git server itself is a ~100 KB pure-Zig WASM module — zero dependencies beyond std, compiled to Wasm, running inside each Worker invocation — implementing SHA-1, zlib, delta encoding/decoding, pack parsing, and the full Git smart-HTTP protocol v1+v2 from scratch. Agent-first conveniences include .import() from any existing Git remote, .fork(name, {readOnly}) for copy-on-write snapshots, native git-notes for agent-written metadata, and — alongside Artifacts — an open-source ArtifactFS filesystem driver that performs a blobless clone and hydrates file contents asynchronously in the background (prioritising manifests and code, deprioritising binaries), collapsing a 2-minute multi-GB clone into ~10–15 s of usable startup. Non-Git consumers are explicitly in scope: per-session agent state, per-customer config with rollback, per-sandbox workspaces. Pricing: $0.15 per 1,000 operations (10 k/month free), $0.50/GB-month (first 1 GB free).

Key takeaways

  1. Versioned storage is an agent-first primitive, and Git is the right wire protocol for it. "We think there's a need for a new primitive: a distributed, versioned filesystem that's built for agents first and foremost." Cloudflare's thesis: LLM-generated code volume is driving an order-of-magnitude change in source-control load, and existing platforms were built for humans, not agents that "never sleep, can work on several issues at once, and never tire." Rather than invent a new protocol ("then you have the bootstrap problem. AI models don't know it, so you have to distribute skills, or a CLI, or hope that users are plugged into your docs MCB"), pick a protocol already deep in the training data of every code model. See concepts/agent-first-storage-primitive, patterns/git-protocol-as-api. (Source: sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git)

  2. Git's data model is an appropriate versioned-storage primitive far beyond source control. "Git's data model is not only good for source control, but for anything where you need to track state, time travel, and persist large amounts of small data. Code, config, session prompts and agent history: all of these are things ('objects') that you often want to store in small chunks ('commits') and be able to revert or otherwise roll back to ('history')." Cloudflare uses Artifacts internally to persist "the current state of the filesystem and the session history in a per-session Artifacts repo" — getting fork / time-travel / diff semantics on arbitrary state for free. See concepts/repo-per-agent-session. (Source: sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git)

  3. One Durable Object per repo — same "agents are one-to-one" economics bet extended to the storage tier. "Artifacts are built on top of Durable Objects. The ability to create millions (or tens of millions+) of instances of stateful, isolated compute is inherent to how Durable Objects work today, and that's exactly what we needed for supporting millions of Git repos per namespace." Cloudflare explicitly frames the DO substrate as production-proven (MLB live fan-out, Confluence Whiteboards, their own Agents SDK) before pitching it as the storage primitive for Artifacts. This is the third 2026-04 Cloudflare launch that rests on DO-as-single-writer-actor-with-embedded-storage as its load-bearing primitive (after Agent Lee and Project Think). See concepts/one-to-one-agent-instance, patterns/do-backed-git-server. (Source: sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git)

  4. A full Git server in ~100 KB of Wasm (pure Zig, zero deps beyond std). "The entire git protocol engine is written in pure Zig (no libc), compiled to a ~100KB WASM binary (with room for optimization!). It implements SHA-1, zlib inflate/deflate, delta encoding/decoding, pack parsing, and the full git smart HTTP protocol — all from scratch, with zero external dependencies other than the standard library." Zig chosen specifically for (a) manual memory control in the ~128 MB DO budget, (b) Zig Build System lets the same source compile both to the production WASM runtime and to native binaries that link against libgit2 for cross-verification, (c) tight WASM↔JS interface — "11 host-imported functions for storage operations (host_get_object, host_put_object, etc.) and one for streaming output (host_emit_bytes). The WASM side is fully testable in isolation." Supports both v1 and v2 of the Git protocol including ls-refs, shallow clones (deepen, deepen-since, deepen-relative), and incremental fetch with have/want negotiation. See concepts/wasm-git-server. (Source: sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git)

  5. Storage layering: DO SQLite → R2 → KV, each with a reason. "Files are stored in the underlying Durable Object's SQLite database. Durable Object storage has a 2MB max row size, so large Git objects are chunked and stored across multiple rows." R2 holds pack-file snapshots for durability / efficient transfer; KV tracks auth tokens (the front-of-Worker auth path). The Worker in front handles authn/authz, key metrics (errors, latency), and DO lookup per request. This is the canonical deep example of "the DO's embedded storage is primary; R2 is blob durability; KV is edge-replicated lookup" stratification across the Cloudflare stack. (Source: sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git)

  6. Memory-budget-aware streaming + delta reuse — not ceremonial engineering. "DOs have ~128MB memory limits: this means we can spawn tens of millions of them (they're fast and light) but have to work within those limits. We make heavy use of streaming in both the fetch and push paths, directly returning a ReadableStream<Uint8Array> built from the raw WASM output chunks. We avoid calculating our own git deltas, instead, the raw deltas and base hashes are persisted alongside the resolved object. On fetch, if the requesting client already has the base object, Zig emits the delta instead of the full object, which saves bandwidth and memory." First-class wiki instance of storing delta-form alongside resolved object to trade storage for runtime CPU/memory on a tight-budget edge host — a subtler variant of how vanilla Git stores only deltas in pack files. (Source: sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git)

  7. .import() + .fork() are the agent-ergonomic primitives, not afterthoughts. Worked example from the post: "Import from GitHub" (env.ARTIFACTS.import({ source: { url: "https://github.com/cloudflare/workers-sdk", branch: "main" }, target: { name: "workers-sdk" } })) returns a { remote, token } pair the agent can hand off to any Git client. repo.fork("workers-sdk-review", { readOnly: true }) produces an isolated, copy-on-write fork ("Want to create 10,000 forks from a known-good starting point? You guessed it: Artifacts again."). Fork-a-session is positioned as a first-class debugging/collaboration primitive: "Debugging something and want another set of eyes? Send a URL and fork it. Want to riff on an API? Have a co-worker fork it and pick up from where you left off." Canonical wiki instance of Git-data-model-for-non-source-control use cases. (Source: sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git)

  8. git-notes as native metadata for agents — prompts, attribution, session state without mutating objects. "On top of this, we have native support for git-notes. Artifacts is designed to be agent-first, and notes enable agents to add notes (metadata) to Git objects. This includes prompts, agent attribution and other metadata that can be read/written from the repo without mutating the objects themselves." Notes sit beside the tree without breaking commit hashes — so LLM-generated annotations (the prompt that produced the diff, which model, what judge verdict) don't invalidate downstream signatures or CI attestations. (Source: sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git)

  9. ArtifactFS: "git clone but async" — blobless clone + background hydration for large-repo agent startup. "A popular web framework (at 2.4GB and with a long history!) takes close to 2 minutes to clone. A shallow clone is faster, but not enough to get down to single digit seconds." ArtifactFS (open-sourced at github.com/cloudflare/artifact-fs) mounts a Git repo as a filesystem: runs a blobless clone (fetches the file tree + refs, not file contents) during sandbox startup so the agent harness can start immediately, then hydrates file contents concurrently in the background via a lightweight daemon. Priority ordering is agent-aware: manifests (package.json, go.mod), config files, and code first; binary blobs (images, executables, non-text) deprioritised. "If a file isn't fully hydrated when the agent tries to read it, the read will block until it has." Claimed savings: ~90–100 s off sandbox startup per large-repo job × 10,000 jobs/month = 2,778 sandbox hours saved. No sync-back: the filesystem doesn't try to reconcile edits to the remote — the agent just commits and pushes as with any Git repo. Works with any Git remote (GitHub, GitLab, self-hosted) — not Artifacts-exclusive. See systems/artifact-fs, concepts/async-clone-hydration, patterns/blobless-clone-lazy-hydrate. (Source: sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git)

  10. Pricing built for "agent-scale massively-single-tenant" economics. "We're still early with Artifacts, but want our pricing to work at agent-scale: it needs to be cost effective to have millions of repos, unused (or rarely used) repos shouldn't be a drag, and our pricing should match the massively-single-tenant nature of agents. You also shouldn't have to think about whether a repo is going to be used or not, whether it's hot or cold, and/or whether an agent is going to wake it up." The shape — pay only for storage + operations, no per-repo hot/cold tier, no idle fee — is the same "zero idle cost" economics Durable Objects + AI Search ship, now extended to versioned storage. $0.15 per 1 000 operations (first 10 k/month free) + $0.50/GB-month (first 1 GB free). This is the wiki's canonical instance of concepts/scale-to-zero economics applied to versioned-storage-at-agent-scale. (Source: sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git)

Architecture

Request path

git clone/fetch/push          REST / Workers-binding call
       │                                │
       ▼                                ▼
┌────────────────────────────────────────────────┐
│  Worker (front-end)                            │
│  • authn/authz (token lookup in KV)            │
│  • metrics (errors, latency)                   │
│  • DO lookup (repo_id → DO instance)           │
└────────────────────────────────────────────────┘
┌────────────────────────────────────────────────┐
│  Durable Object (one per repo, ~128 MB)        │
│  ┌───────────────────────────────────────────┐ │
│  │  Zig WASM git server (~100 KB)            │ │
│  │  • SHA-1, zlib, delta encode/decode       │ │
│  │  • pack parse, smart-HTTP v1+v2           │ │
│  │  • ls-refs, shallow/deep, have/want       │ │
│  │  • streams ReadableStream<Uint8Array>     │ │
│  └───────────────────────────────────────────┘ │
│  ┌───────────────────────────────────────────┐ │
│  │  embedded SQLite (state.storage.kv)       │ │
│  │  • git objects, chunked across rows       │ │
│  │    (2 MB row limit → multi-row large      │ │
│  │    objects)                               │ │
│  │  • raw deltas + base hashes alongside     │ │
│  │    resolved objects (emit delta if        │ │
│  │    client has base)                       │ │
│  └───────────────────────────────────────────┘ │
└────────────────────────────────────────────────┘
       │                     │
       ▼                     ▼
┌──────────────────┐   ┌──────────────────┐
│  R2              │   │  KV              │
│  pack-file       │   │  auth token      │
│  snapshots       │   │  tracking        │
└──────────────────┘   └──────────────────┘

ArtifactFS (agent-sandbox startup)

 t=0   mount request
 t=0ms blobless clone starts (tree + refs only)
       │      ┌─────────── agent harness runs here ──────────┐
       ▼      │                                              │
 t~1s  tree + refs present; agent can ls, find, read configs │
       │      │                                              │
       ▼      │                                              │
 t~1s+ daemon begins concurrent hydration                    │
       │      │                                              │
       ▼      │                                              │
       hydration priority:                                   │
         1. package.json / go.mod / pyproject.toml (manifests)
         2. source files (*.ts, *.py, *.rs, *.go, ...)
         3. configs (*.yaml, *.toml, *.json)
         4. text (*.md, *.txt)
         5. binaries (*.png, *.so, *.exe) deprioritised
       │      │
       ▼      ▼
 reads block only on yet-to-be-hydrated files
 no sync-back; agent commits/pushes normally

Code sample from post

// Create a repo
const repo = await env.AGENT_REPOS.create(name)
return { repo.remote, repo.token }

// Git clone against it as any regular Git remote
// $ git clone https://x:${TOKEN}@123def456abc.artifacts.cloudflare.net/git/repo-13194.git

// Import from an existing Git remote and fork for isolated review
const { remote, token } = await env.ARTIFACTS.import({
  source: { url: "https://github.com/cloudflare/workers-sdk", branch: "main" },
  target: { name: "workers-sdk" },
})
const repo = await env.ARTIFACTS.get("workers-sdk")
const fork = await repo.fork("workers-sdk-review", { readOnly: true })
return Response.json({ remote: fork.remote, token: fork.token })

Operational numbers

  • Git server binary: ~100 KB WASM (pure Zig, zero external deps beyond std).
  • Host-imported functions: 11 storage ops (e.g. host_get_object, host_put_object) + 1 output op (host_emit_bytes).
  • Durable Object memory budget: ~128 MB per instance.
  • Durable Object storage row limit: 2 MB max row size → multi-row chunking for large Git objects.
  • ArtifactFS quantified savings example: 2.4 GB web-framework repo cloning "close to 2 minutes" via standard git clone; ~90–100 s of that recoverable per clone with blobless-plus-async-hydration. 10 k large-repo jobs/month × 90–100 s = ~2,778 sandbox hours saved/month at that volume.
  • Pricing: $0.15 per 1 000 operations (first 10 k/month free); $0.50/GB-month storage (first 1 GB free). Workers Free-plan access planned during the beta.
  • Beta milestones: private beta at launch (2026-04-16); public beta by early May 2026.
  • Near-term roadmap: expanded per-namespace / per-repo metrics; Event Subscriptions for push/pull/clone/fork events; native TypeScript / Go / Python SDKs; repo-level + namespace-wide search APIs ("find all the repos with a package.json file"); Workers Builds API for agent-driven CI/CD.

Caveats

  • Beta launch postno production-scale numbers disclosed: no QPS, no p99 latency, no clone/fetch throughput distributions, no number of repos in the private beta, no fleet-health metrics.
  • 2-minute 2.4 GB clone baseline is an illustrative worked example in the post, not a benchmarked fleet baseline; the "~90–100 s savings" and "2,778 sandbox hours/month" numbers are hypothetical scaling of that example, not measured production outcomes.
  • Hash algorithm — post mentions SHA-1 for pack-parsing; does not state whether SHA-256 (Git's planned hash transition) is supported, or the upgrade path. Git-LFS and git-notes are mentioned as supported / planned extensibility targets.
  • Extensibility story — git-notes is supported natively; LFS is flagged as an extensibility target without a specific ship timeline.
  • Pricing evolution"we'll provide updates throughout the beta should this pricing change and ahead of billing any usage" — prices may move before GA.
  • Non-Git consumer story is enthusiastic but thinly worked — post mentions per-customer config rollback as a use-case but doesn't ship an SDK or data model for non-Git access; non-Git access today is via isomorphic-git or the forthcoming native SDKs.
  • ArtifactFS is a separate open-source tool usable against any Git remote (GitHub / GitLab / self-hosted) — not an Artifacts-exclusive dependency. The Artifacts ↔ ArtifactFS split is genuine: one is the server, the other is the client-side FS driver.
  • No detail on durability / replication model of the DO↔R2↔KV storage stratification (e.g. snapshot cadence from DO SQLite → R2, KV eventual-consistency implications for token revocation).
  • No detail on multi-tenant isolation model at the namespace level beyond namespaces being the scoping unit for repos + metrics + forthcoming events.

Source

Last updated · 200 distilled / 1,178 read