Skip to content

CONCEPT Cited by 1 source

Snapshot repository as agent corpus

Snapshot repository as agent corpus names the architectural choice of giving an agent a dedicated versioned repository as its knowledge-base view, distinct from both the live source-of-truth DB that holds the source configuration and from any vector index over the same data.

Canonical Vercel framing

From the Knowledge Agent Template pipeline:

"You add sources through the admin interface, and they're stored in Postgres. Content syncs to a snapshot repository via Vercel Workflow. When the agent needs to search, a Vercel Sandbox loads the snapshot."

(Source: sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings)

Structural role

Three tiers in the pipeline:

  1. Live source-of-truth. Postgres (admin UI source configuration: which GitHub repos, which YouTube channels, which markdown doc paths, which APIs).
  2. Snapshot repository. A derived, content- materialised filesystem view of the corpus, written asynchronously by Vercel Workflow from (1). This is what the agent sees.
  3. Sandbox-loaded working copy. A per-request checkout of (2) in the Vercel Sandbox the agent runs bash against.

The snapshot is derived from (1) and consumed by (3). It's the stable, versioned, agent-facing view.

Why three tiers, not two

A naïve design would give the agent direct access to Postgres. Four reasons the snapshot tier exists:

  • Format adaptation. Agent wants filesystem; source lives in Postgres + GitHub + YouTube + API responses.
  • Load isolation. Agent retrieval doesn't hit the source APIs or the admin DB.
  • Versioning. Snapshot repo has a history; bad content is diff-inspectable and revertable.
  • Corpus-scope clarity. The snapshot is exactly what the agent sees; no accidental exposure of adjacent admin data.

Why a repository, not just a folder

The post uses the word "repository" specifically and links to Vercel's Sandbox snapshot concept. The repository shape gives:

  • Immutability. A snapshot at revision R doesn't change; you reload R to reproduce a retrieval.
  • Auditability. Git-shaped history of what changed when.
  • Reproducibility. An agent's wrong answer at timestamp T can be reproduced by loading the snapshot as of T.

This composes with concepts/traceability-of-retrieval — not only can you see the shell commands the agent ran, you can see the exact filesystem it ran them on.

Contrast with vector index

A vector index is an opaque derived artifact: re-embedding changes it globally; there's no readable diff between index version V1 and V2. A snapshot repository has the opposite property — every change is a diff.

Seen in

Last updated · 476 distilled / 1,218 read