Skip to content

PATTERN Cited by 1 source

Snapshot sync from Postgres to repo

Pattern

Run a durable background orchestrator (e.g. Vercel Workflow) that transforms live source-configuration state (in Postgres) into a derived, versioned, immutable snapshot repository the agent's retrieval sandbox can load. The admin interface writes to Postgres; the orchestrator syncs; the agent reads the snapshot.

Canonical Vercel framing

From the Knowledge Agent Template pipeline:

"You add sources through the admin interface, and they're stored in Postgres. Content syncs to a snapshot repository via Vercel Workflow. When the agent needs to search, a Vercel Sandbox loads the snapshot."

(Source: sources/2026-04-21-vercel-build-knowledge-agents-without-embeddings)

Why this separation exists

Three deployments share a canonical shape:

  • Live source-of-truth DB (Postgres).
  • Derived snapshot repo — versioned, materialised.
  • Per-request sandbox checkout of the snapshot.

The pattern names the producer that bridges the first two. The consumer side is patterns/bash-in-sandbox-as-retrieval-tool.

Four reasons the producer is a separate orchestrator, not inline with retrieval:

  • Content materialisation is expensive. A GitHub repo clone, a YouTube transcript pull, an API pagination walk — none should happen on the agent's critical path.
  • Source API rate limits. Admin DB scales with users; the snapshot sync scales with source APIs (GitHub, YouTube, etc.) whose limits are lower.
  • Failure decoupling. A source API outage shouldn't take the agent down; the agent keeps serving the last good snapshot.
  • Version discipline. The snapshot repo has a history, the admin DB has current state; you can't roll back the DB to debug a week-old agent answer — you can reload a week-old snapshot.

Mechanism shape

admin UI → writes → Postgres (live config)
                        │ (event or schedule)
                 Vercel Workflow
                        │ (fan out per source)
         ┌──────────────┼──────────────┐
         ▼              ▼              ▼
  GitHub clone    YouTube pull    Markdown sync
         └──────────────┼──────────────┘
                        │ (materialise into repo)
               snapshot repository
                        │ (pull at retrieval time)
                 Vercel Sandbox

What this pattern solves vs vector-index refresh

A vector-index refresh is:

  • All-or-nothing. Re-embedding happens as a batch; partial updates are awkward.
  • Opaque. A diff between index version N and N+1 isn't human-readable.
  • Lossy on top. The snapshot at the repo level is the ground truth; an index is a lossy transformation.

A snapshot repo sync is:

  • Incremental. Changed sources get re-pulled; unchanged sources stay.
  • Diff-inspectable. The repo history shows what changed.
  • The ground truth. The agent reads it directly; no transformation layer.

What's undisclosed (Vercel 2026-04-21 post)

  • Trigger model. Event-driven on admin writes? Scheduled? Both?
  • Change detection. ETags, last-modified, content-hash, full re-pull?
  • Partial-failure handling. One source fails; does the whole sync abort, or does the snapshot advance with the failed source marked stale?
  • Parallelism. Sync fans out per source; how many concurrent source pulls?
  • Rollback. The post implies versioning is available; the rollback UX / API is not named.
  • Large-source strategy. A 10-GB GitHub repository — does it clone in full every time, use sparse checkout, or differentially fetch?

Composition with retrieval

The snapshot sync is write-path; retrieval is read-path. They share the repo. This is the classical materialised-derived-view shape from database engineering applied to agent infrastructure — Postgres is the OLTP source of truth; the snapshot repo is the derived-view consumed by the agent's retrieval tools.

Seen in

Last updated · 476 distilled / 1,218 read