PATTERN Cited by 1 source
Bash in sandbox as retrieval tool¶
Pattern¶
Expose bash (and a batched variant like bash_batch)
as tools on the agent, backed by an isolated
sandbox that has a curated filesystem loaded in
— typically from a
snapshot
repository. The agent uses grep, find, cat,
ls to retrieve knowledge directly from the
filesystem, instead of querying a vector DB.
This is the pattern that instantiates concepts/filesystem-as-retrieval-substrate and achieves concepts/traceability-of-retrieval.
Canonical statement¶
From Vercel's 2026-04-21 Knowledge Agent Template launch:
"We replaced our vector pipeline with a filesystem and gave the agent
bash. ... Your agent usesgrep,find, andcatinside of isolated Vercel Sandboxes. ... The agent'sbashandbash_batchtools execute file-system commands."
(Source: sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings)
Why each piece matters¶
bash, not a retrieval DSL. The interface is one the model was trained on at scale — no bespoke tool-schema the model has to learn.- Sandbox, not host filesystem. The agent runs inside an isolated execution environment; the sandbox's filesystem is the snapshot, not the host. This bounds blast radius if the agent tries to write or exfiltrate. See concepts/server-side-sandboxing for the generic framing.
- Snapshot, not live source. The agent sees a versioned immutable corpus view; the live admin DB is isolated from retrieval load.
- Ephemeral sandbox per request. No cross-request state; a compromise doesn't persist.
Mechanism shape¶
user question
│
▼
agent pipeline (AI SDK / Chat SDK)
│
▼
┌─ Vercel Sandbox (per retrieval turn) ─────┐
│ filesystem: snapshot repository loaded │
│ │
│ tools: bash, bash_batch │
│ ├─ grep -r "pricing" docs/ │
│ ├─ find docs/ -name "*.md" │
│ ├─ cat docs/plans/enterprise.md │
│ └─ ls docs/ │
└───────────────────────────────────────────┘
│
▼
answer with optional references
What bash_batch adds over bash¶
The post names bash and bash_batch as two
separate tools without defining the difference.
Plausible readings (not disclosed):
- Parallel fan-out. Run several independent commands concurrently and return all outputs together — amortises round-trip cost when the agent plans multiple searches.
- Single-script multi-command. Execute a short shell script with multiple commands sequentially and return the combined output.
Either reduces tool-call-round-trip cost when the agent's retrieval plan has multiple steps.
Trade-offs¶
- Corpus size.
grep -ron a 100-GB corpus is too slow; the pattern implicitly assumes the snapshot fits within reasonable Sandbox disk + walks in seconds. Sharding or partitioning is the escape hatch when corpora grow beyond this. - Semantic similarity gap. "Find docs that talk
about X but don't use the word X" —
grepcan't do this without keyword expansion by the agent. Hybrid retrieval (semantic pre-filter, keyword confirm) is a natural composition. - Sandbox boot cost. Each retrieval turn potentially spawns a Sandbox; boot / teardown latency matters for end-to-end retrieval p50. (Undisclosed in the post.)
- Tool-prompt quality matters. The agent's choice
of what to
grepfor is still the load-bearing step. A weak search strategy produces wrong retrievals; the fix is in the tool description or the agent's system prompt, not in a similarity threshold. - Write operations. This pattern only needs read
tools (
grep,find,cat,ls). Exposing write (echo >,rm,mv) is outside the pattern — the corpus is the snapshot repo, modified out-of-band by Workflow.
Adjacent / complementary patterns¶
- patterns/read-only-curated-example-filesystem — v0's library-API-examples directory is the same architectural class at a smaller, API-surface altitude. The Knowledge Agent Template generalises to arbitrary enterprise corpora.
- patterns/agent-sandbox-with-gateway-only-egress — Redpanda's Openclaw-era pattern focuses on egress control: the agent has a sandbox, but outbound network goes through a gateway. The Vercel pattern doesn't need egress control for retrieval because retrieval is filesystem-local.
- concepts/server-side-sandboxing — the generic VM/container/seccomp framing; Vercel Sandbox is the product that instantiates one of these under the hood (not disclosed which).
Seen in¶
- sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings — canonical production pattern; Vercel's internal sales-call summarisation agent went from ~\$1.00 to ~\$0.25 per call after adopting this shape.
Related¶
- concepts/filesystem-as-retrieval-substrate
- concepts/snapshot-repository-as-agent-corpus
- concepts/traceability-of-retrieval
- concepts/embedding-black-box-debugging
- concepts/server-side-sandboxing
- patterns/agent-sandbox-with-gateway-only-egress
- patterns/read-only-curated-example-filesystem
- systems/vercel-sandbox
- systems/vercel-knowledge-agent-template