PATTERN Cited by 1 source

Snapshot sync from Postgres to repo¶

Pattern¶

Run a durable background orchestrator (e.g. Vercel Workflow) that transforms live source-configuration state (in Postgres) into a derived, versioned, immutable snapshot repository the agent's retrieval sandbox can load. The admin interface writes to Postgres; the orchestrator syncs; the agent reads the snapshot.

Canonical Vercel framing¶

From the Knowledge Agent Template pipeline:

"You add sources through the admin interface, and they're stored in Postgres. Content syncs to a snapshot repository via Vercel Workflow. When the agent needs to search, a Vercel Sandbox loads the snapshot."

(Source: sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings)

Why this separation exists¶

Three deployments share a canonical shape:

Live source-of-truth DB (Postgres).
Derived snapshot repo — versioned, materialised.
Per-request sandbox checkout of the snapshot.

The pattern names the producer that bridges the first two. The consumer side is patterns/bash-in-sandbox-as-retrieval-tool.

Four reasons the producer is a separate orchestrator, not inline with retrieval:

Content materialisation is expensive. A GitHub repo clone, a YouTube transcript pull, an API pagination walk — none should happen on the agent's critical path.
Source API rate limits. Admin DB scales with users; the snapshot sync scales with source APIs (GitHub, YouTube, etc.) whose limits are lower.
Failure decoupling. A source API outage shouldn't take the agent down; the agent keeps serving the last good snapshot.
Version discipline. The snapshot repo has a history, the admin DB has current state; you can't roll back the DB to debug a week-old agent answer — you can reload a week-old snapshot.

Mechanism shape¶

admin UI → writes → Postgres (live config)
                        │
                        │ (event or schedule)
                        ▼
                 Vercel Workflow
                        │
                        │ (fan out per source)
                        ▼
         ┌──────────────┼──────────────┐
         ▼              ▼              ▼
  GitHub clone    YouTube pull    Markdown sync
         └──────────────┼──────────────┘
                        │
                        │ (materialise into repo)
                        ▼
               snapshot repository
                        │
                        │ (pull at retrieval time)
                        ▼
                 Vercel Sandbox

What this pattern solves vs vector-index refresh¶

A vector-index refresh is:

All-or-nothing. Re-embedding happens as a batch; partial updates are awkward.
Opaque. A diff between index version N and N+1 isn't human-readable.
Lossy on top. The snapshot at the repo level is the ground truth; an index is a lossy transformation.

A snapshot repo sync is:

Incremental. Changed sources get re-pulled; unchanged sources stay.
Diff-inspectable. The repo history shows what changed.
The ground truth. The agent reads it directly; no transformation layer.

What's undisclosed (Vercel 2026-04-21 post)¶

Trigger model. Event-driven on admin writes? Scheduled? Both?
Change detection. ETags, last-modified, content-hash, full re-pull?
Partial-failure handling. One source fails; does the whole sync abort, or does the snapshot advance with the failed source marked stale?
Parallelism. Sync fans out per source; how many concurrent source pulls?
Rollback. The post implies versioning is available; the rollback UX / API is not named.
Large-source strategy. A 10-GB GitHub repository — does it clone in full every time, use sparse checkout, or differentially fetch?

Composition with retrieval¶

The snapshot sync is write-path; retrieval is read-path. They share the repo. This is the classical materialised-derived-view shape from database engineering applied to agent infrastructure — Postgres is the OLTP source of truth; the snapshot repo is the derived-view consumed by the agent's retrieval tools.

Seen in¶

sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings — canonical Vercel-stack instance; Vercel Workflow as orchestrator, Postgres as source of truth, snapshot repo as the agent-facing derived view.

concepts/snapshot-repository-as-agent-corpus — the consumer-side framing of the snapshot.
concepts/filesystem-as-retrieval-substrate — the retrieval choice that consumes the snapshot.
patterns/bash-in-sandbox-as-retrieval-tool — the read-path complement.
systems/vercel-workflow — canonical orchestrator.
systems/vercel-sandbox — canonical consumer.
systems/vercel-knowledge-agent-template — canonical composition.