Vercel — Build knowledge agents without embeddings¶
Summary¶
Vercel's 2026-04-21 launch post for the open-source
Knowledge Agent Template
— a production-ready knowledge-agent architecture that
replaces the vector-database / chunking / embedding-
model retrieval stack with a filesystem and bash.
Sources (GitHub repos, YouTube transcripts, markdown
docs, custom APIs) are stored in Postgres, synced to a
snapshot repository via Vercel
Workflow, and served to the agent as a
Vercel Sandbox-loaded filesystem
where the agent runs grep, find, cat, and ls via
bash / bash_batch tools. Production datum from the
internal sales-call summarisation prototype that motivated
the template: ~$1.00 → ~$0.25 per call (4× cost
reduction) with output quality improved after
replacing the vector pipeline.
The architectural thesis is that retrieval opacity is
the production problem the embedding stack can't solve: a
wrong answer from a vector-DB-backed agent requires
debugging "why did chunk X score 0.82 and the correct
chunk score 0.79" — a debugging loop across the chunking
boundary, the embedding model, and the similarity
threshold. Filesystem retrieval makes the trace readable:
the agent ran grep -r "pricing" docs/, read
docs/plans/enterprise.md, extracted the wrong section,
and you fix the file or the search strategy. "You're
debugging a question, not a pipeline."
The template also ships a Chat
SDK-based multi-platform adapter layer (Slack, Discord,
Microsoft Teams, Google Chat, GitHub) with one agent
pipeline shared across platforms; a complexity router
that classifies each incoming question and dispatches to
fast/cheap or slow/powerful models (routed via
Vercel AI Gateway); a
@savoir/sdk package that lets other AI-SDK-powered apps
query the same knowledge base as tools; and an
AI-powered admin agent with internal tools
(query_stats, query_errors, run_sql, chart) so
operators can debug the knowledge agent with another
agent.
Key takeaways¶
-
Vector-DB retrieval is opaque; filesystem retrieval is transparent. "With filesystem search, there is no guessing why it picked that chunk and no tuning retrieval scores in the dark. You're debugging a question, not a pipeline." Canonicalises the embedding- black-box-debugging failure mode as a first-class reason to reject the vector stack for structured or citeable corpora (Source: sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings).
-
Replace the vector DB with a filesystem, then give the agent
bash. Verbatim: "We replaced our vector pipeline with a filesystem and gave the agentbash. Our sales call summarization agent went from ~\$1.00 to ~\$0.25 per call, and the output quality improved. The agent was doing what it already knew how to do: read files, rungrep, and navigate directories." Canonicalises the filesystem as retrieval substrate and the bash-in- sandbox-as-retrieval-tool pattern. Operational datum: 4× cost reduction + quality up. -
LLMs were trained on filesystems. Central architectural reframing: "LLMs already understand filesystems. They've been trained on massive amounts of code: navigating directories, grepping through files, managing state across complex codebases. If agents excel at filesystem operations for code, they excel at them for anything. That's the insight behind the filesystem and bash approach. You're not teaching the model a new skill; you're using the one it's best at." This is a skill-alignment argument against teaching the model a bespoke retrieval DSL — use the interface it already has.
-
Sandbox provides isolation; snapshot repository provides the corpus. The production mechanism: "(1) You add sources through the admin interface, and they're stored in Postgres. (2) Content syncs to a snapshot repository via Vercel Workflow. (3) When the agent needs to search, a Vercel Sandbox loads the snapshot. (4) The agent's
bashandbash_batchtools execute file-system commands. (5) The agent returns an answer with optional references." Canonicalises the snapshot- repository-as-agent-corpus concept and snapshot- sync-from-postgres-to-repo pattern. -
One agent, every platform. "Your agent has one knowledge base, one codebase, and one source of truth. Yet your engineers are scattered across Slack, your community spread across Discord, your bug reports buried in GitHub." Chat SDK's adapter pattern: each adapter handles platform-specific concerns (auth, event formats, messaging) while the agent pipeline stays unchanged.
onNewMentionfires regardless of platform source. Template ships GitHub + Discord; Chat SDK officially supports Slack, Microsoft Teams, Google Chat. Canonicalises the multi-platform chat adapter with single agent pattern. -
Complexity router + AI Gateway = automatic cost optimisation. "Every incoming question is classified by complexity and routed to the right model. Simple questions go to fast, cheap models. Hard questions go to powerful ones. Cost optimization happens automatically, with no manual rules." This is the canonical-on-the-wiki complexity- tiered model selection pattern, instantiated with the AI Gateway as transport so any AI-SDK-compatible model provider can slot into either tier.
-
Results are deterministic, explainable, fast. Contrast with vectors is explicit: "When the agent gives a wrong answer, you open the trace and see: it ran
grep -r \"pricing\" docs/, readdocs/plans/enterprise.md, and pulled the wrong section. You fix the file or adjust the agent's search strategy. The whole debugging loop takes minutes." Canonicalises traceability of retrieval as the success-criterion axis. -
Debug your agent with another agent. "There's also an AI-powered admin agent. You can ask it questions like: 'what errors occurred in the last 24 hours', or 'what are the common questions users ask'. It will use internal tools (
query_stats,query_errors,run_sql, andchart) to provide answers directly. You debug your agent with an agent." Canonicalises the AI-powered admin agent pattern — reuse the same agent pipeline for operational introspection, with scoped read-only tools on the telemetry surface.
Systems named¶
- Knowledge Agent Template — the open-source agent template itself: filesystem-based retrieval + multi-platform adapters + complexity router + admin agent. Shipped under Vercel's templates directory; deploy-to-Vercel one-click flow. Companion post on filesystems + bash documents the upstream prototype (sales-call summariser).
- Vercel Sandbox — isolated
compute substrate that loads the snapshot repository and
executes the agent's
bash/bash_batchtool calls. - Vercel Workflow — the orchestrator that syncs content from Postgres into the snapshot repository.
- Chat SDK — Vercel's
adapter framework for multi-platform bots; shipped
adapters for Slack, Discord, GitHub, Microsoft Teams,
Google Chat, plus community adapters. Redis-backed
state (
createRedisState) for cross-platform session. - Vercel AI Gateway — the model-provider abstraction over which the complexity router dispatches. Any AI-SDK-compatible provider slots in.
- Vercel AI SDK — the underlying
TypeScript toolkit.
@savoir/sdkships as tools an AI-SDK agent can import to query the knowledge base (renamed per-deployment).
Concepts canonicalised¶
- concepts/filesystem-as-retrieval-substrate —
filesystem +
bashas an alternative to a vector DB for agent retrieval; rides on LLMs' code-training skill distribution. - concepts/embedding-black-box-debugging — the production-debuggability gap in vector retrieval; the post's three-axis failure taxonomy (chunking boundary / embedding model / similarity threshold) is canonicalised here.
- concepts/snapshot-repository-as-agent-corpus — a dedicated versioned repository as the agent's corpus view; distinct from the live DB source-of-truth and from the vector index.
- concepts/traceability-of-retrieval — the success property that distinguishes filesystem search from vector search: the trace of search commands and file reads is human-readable.
Patterns canonicalised¶
- patterns/bash-in-sandbox-as-retrieval-tool — give
the agent
bash/bash_batchtools in an isolated sandbox pointed at the corpus filesystem; let it rungrep,find,cat,lsdirectly. - patterns/multi-platform-chat-adapter-single-agent — per-platform adapter layer (Slack / Discord / Teams / Google Chat / GitHub / web) with one shared agent pipeline; adapter handles auth + event format + messaging; agent pipeline untouched.
- patterns/snapshot-sync-from-postgres-to-repo — async background job that transforms the live source configuration (Postgres) into an immutable snapshot repository the sandbox can load.
- patterns/ai-powered-admin-agent-self-debug — a
second agent with read-only tools over the primary
agent's telemetry (
query_stats,query_errors,run_sql,chart); operators ask questions in natural language.
Extended (existing pages)¶
- patterns/complexity-tiered-model-selection — cross-wiki canonicalisation of the router shape; the Vercel instantiation adds the AI Gateway transport layer and the always-on per-question classification (vs per-input-heuristic Instacart variant).
- patterns/read-only-curated-example-filesystem — sibling pattern. v0's 2026-01-08 post described the library-API-examples instance; this post extends the same substrate class to generic enterprise knowledge corpora at a different altitude (text docs + APIs + transcripts, not API-surface examples).
- concepts/grep-loop — Cloudflare's 2026-04-17
llms.txtpost named the grep loop as an anti- pattern when the corpus doesn't fit one context window; Vercel's 2026-04-21 post names the inverse: a sandbox-scoped snapshot repo plus intentionalbashtools turns agentic grep into a desirable retrieval primitive. Both framings coexist — the distinguishing axis is whether the agent can iterate inside a scoped filesystem vs iterate against an unbounded web doc corpus. - concepts/web-search-telephone-game — the 2026-01-08 v0 post framed web-search RAG as a telephone game where a summariser model corrupts the path from question to answer; this post extends the critique by identifying the same opacity in vector retrieval (chunking + embedding + threshold are three summarisation-like transformations).
Operational numbers disclosed¶
- ~\$1.00 → ~\$0.25 per call (4× cost reduction) on
Vercel's internal sales-call summarisation agent after
replacing the vector pipeline with a filesystem +
bash. "The output quality improved." - Pipeline layers: 5 (admin Postgres → Workflow →
snapshot repo → Sandbox load →
bashtool calls). - Chat SDK adapters named: Slack, Microsoft Teams, Google Chat, Discord, GitHub, plus "official and community adapters" (adapter directory linked).
- Admin agent tools:
query_stats,query_errors,run_sql,chart(four tools disclosed).
Caveats¶
- No production numbers beyond the prototype 4× cost. No throughput, no p50/p99 retrieval latency, no fleet metrics, no accuracy / precision numbers, no before/after quality delta for the sales-call summariser beyond "quality improved".
- Corpus-size ceiling undisclosed. The argument for filesystem retrieval implicitly assumes the snapshot fits in a single sandbox disk; no guidance on multi- GB or multi-TB corpora, sharding, partitioning, or hot- cold tiering.
- Complexity-classifier mechanism opaque. "Every incoming question is classified by complexity" — no disclosure of whether the classifier is a heuristic, a fine-tuned model, a prompt, or an embedding.
- Snapshot-sync cadence undocumented. Workflow- orchestrated sync is named; refresh frequency, change detection, rollback semantics, partial-failure handling all elided.
- No accuracy benchmark vs vector DB baselines. The opacity argument is the pitch; no head-to-head retrieval-quality numbers against a Pinecone / Weaviate / pgvector baseline on the same corpus.
@savoir/sdkpackage name is a placeholder. Post explicitly notes "customize the package name from@savoir/sdkto your own" — it's the template's rename-before-ship convention, not a shipped public package.- Admin-agent permissions model undisclosed.
run_sqlis strong — scope of read/write, RBAC, injection surface, audit trail all elided. - Small-file / many-file limits on sandbox filesystem
undocumented.
grep -rbehaviour on 100k+ files is a real engineering concern not addressed. - Launch-voice product-marketing post. CTAs to template + products throughout; the architectural content runs through the post's middle rather than being the framing.
Source¶
- Original: https://vercel.com/blog/build-knowledge-agents-without-embeddings
- Raw markdown:
raw/vercel/2026-04-21-build-knowledge-agents-without-embeddings-25eb4193.md
Related¶
- companies/vercel
- systems/vercel-knowledge-agent-template
- systems/vercel-sandbox
- systems/vercel-chat-sdk
- systems/vercel-workflow
- systems/vercel-ai-gateway
- systems/ai-sdk
- concepts/filesystem-as-retrieval-substrate
- concepts/embedding-black-box-debugging
- concepts/snapshot-repository-as-agent-corpus
- concepts/traceability-of-retrieval
- concepts/grep-loop
- concepts/web-search-telephone-game
- patterns/bash-in-sandbox-as-retrieval-tool
- patterns/multi-platform-chat-adapter-single-agent
- patterns/snapshot-sync-from-postgres-to-repo
- patterns/ai-powered-admin-agent-self-debug
- patterns/complexity-tiered-model-selection
- patterns/read-only-curated-example-filesystem