CONCEPT Cited by 1 source

Snapshot repository as agent corpus¶

Snapshot repository as agent corpus names the architectural choice of giving an agent a dedicated versioned repository as its knowledge-base view, distinct from both the live source-of-truth DB that holds the source configuration and from any vector index over the same data.

Canonical Vercel framing¶

From the Knowledge Agent Template pipeline:

"You add sources through the admin interface, and they're stored in Postgres. Content syncs to a snapshot repository via Vercel Workflow. When the agent needs to search, a Vercel Sandbox loads the snapshot."

(Source: sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings)

Structural role¶

Three tiers in the pipeline:

Live source-of-truth. Postgres (admin UI source configuration: which GitHub repos, which YouTube channels, which markdown doc paths, which APIs).
Snapshot repository. A derived, content- materialised filesystem view of the corpus, written asynchronously by Vercel Workflow from (1). This is what the agent sees.
Sandbox-loaded working copy. A per-request checkout of (2) in the Vercel Sandbox the agent runs bash against.

The snapshot is derived from (1) and consumed by (3). It's the stable, versioned, agent-facing view.

Why three tiers, not two¶

A naïve design would give the agent direct access to Postgres. Four reasons the snapshot tier exists:

Format adaptation. Agent wants filesystem; source lives in Postgres + GitHub + YouTube + API responses.
Load isolation. Agent retrieval doesn't hit the source APIs or the admin DB.
Versioning. Snapshot repo has a history; bad content is diff-inspectable and revertable.
Corpus-scope clarity. The snapshot is exactly what the agent sees; no accidental exposure of adjacent admin data.

Why a repository, not just a folder¶

The post uses the word "repository" specifically and links to Vercel's Sandbox snapshot concept. The repository shape gives:

Immutability. A snapshot at revision R doesn't change; you reload R to reproduce a retrieval.
Auditability. Git-shaped history of what changed when.
Reproducibility. An agent's wrong answer at timestamp T can be reproduced by loading the snapshot as of T.

This composes with concepts/traceability-of-retrieval — not only can you see the shell commands the agent ran, you can see the exact filesystem it ran them on.

Contrast with vector index¶

A vector index is an opaque derived artifact: re-embedding changes it globally; there's no readable diff between index version V1 and V2. A snapshot repository has the opposite property — every change is a diff.

Seen in¶

sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings — canonical three-tier architecture; snapshot repo as the middle tier between admin Postgres and the sandbox-loaded working copy.

concepts/filesystem-as-retrieval-substrate — the retrieval-interface axis; snapshot repository is what that interface points at.
concepts/traceability-of-retrieval — the success-property axis; snapshot versioning is what makes retrieval traces reproducible.
patterns/snapshot-sync-from-postgres-to-repo — the orchestration pattern that keeps the snapshot fresh.
patterns/bash-in-sandbox-as-retrieval-tool — the retrieval pattern that consumes the snapshot.
systems/vercel-workflow — producer of the snapshot.
systems/vercel-sandbox — consumer of the snapshot.
systems/vercel-knowledge-agent-template — canonical instantiation.