CONCEPT Cited by 1 source
Snapshot repository as agent corpus¶
Snapshot repository as agent corpus names the architectural choice of giving an agent a dedicated versioned repository as its knowledge-base view, distinct from both the live source-of-truth DB that holds the source configuration and from any vector index over the same data.
Canonical Vercel framing¶
From the Knowledge Agent Template pipeline:
"You add sources through the admin interface, and they're stored in Postgres. Content syncs to a snapshot repository via Vercel Workflow. When the agent needs to search, a Vercel Sandbox loads the snapshot."
(Source: sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings)
Structural role¶
Three tiers in the pipeline:
- Live source-of-truth. Postgres (admin UI source configuration: which GitHub repos, which YouTube channels, which markdown doc paths, which APIs).
- Snapshot repository. A derived, content- materialised filesystem view of the corpus, written asynchronously by Vercel Workflow from (1). This is what the agent sees.
- Sandbox-loaded working copy. A per-request
checkout of (2) in the
Vercel Sandbox the agent runs
bashagainst.
The snapshot is derived from (1) and consumed by (3). It's the stable, versioned, agent-facing view.
Why three tiers, not two¶
A naïve design would give the agent direct access to Postgres. Four reasons the snapshot tier exists:
- Format adaptation. Agent wants filesystem; source lives in Postgres + GitHub + YouTube + API responses.
- Load isolation. Agent retrieval doesn't hit the source APIs or the admin DB.
- Versioning. Snapshot repo has a history; bad content is diff-inspectable and revertable.
- Corpus-scope clarity. The snapshot is exactly what the agent sees; no accidental exposure of adjacent admin data.
Why a repository, not just a folder¶
The post uses the word "repository" specifically and links to Vercel's Sandbox snapshot concept. The repository shape gives:
- Immutability. A snapshot at revision R doesn't change; you reload R to reproduce a retrieval.
- Auditability. Git-shaped history of what changed when.
- Reproducibility. An agent's wrong answer at timestamp T can be reproduced by loading the snapshot as of T.
This composes with concepts/traceability-of-retrieval — not only can you see the shell commands the agent ran, you can see the exact filesystem it ran them on.
Contrast with vector index¶
A vector index is an opaque derived artifact: re-embedding changes it globally; there's no readable diff between index version V1 and V2. A snapshot repository has the opposite property — every change is a diff.
Seen in¶
- sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings — canonical three-tier architecture; snapshot repo as the middle tier between admin Postgres and the sandbox-loaded working copy.
Related¶
- concepts/filesystem-as-retrieval-substrate — the retrieval-interface axis; snapshot repository is what that interface points at.
- concepts/traceability-of-retrieval — the success-property axis; snapshot versioning is what makes retrieval traces reproducible.
- patterns/snapshot-sync-from-postgres-to-repo — the orchestration pattern that keeps the snapshot fresh.
- patterns/bash-in-sandbox-as-retrieval-tool — the retrieval pattern that consumes the snapshot.
- systems/vercel-workflow — producer of the snapshot.
- systems/vercel-sandbox — consumer of the snapshot.
- systems/vercel-knowledge-agent-template — canonical instantiation.