Skip to content

PATTERN Cited by 1 source

Multi-stage extraction pipeline

Problem

Converting a raw conversation transcript into durable, searchable memories is not a single LLM call. A naive "summarise this to a list of facts" prompt produces:

  • Hallucinated facts not in the transcript.
  • Generalisations that lose load-bearing specifics (names, prices, version numbers).
  • No distinction between atomic facts, time-bounded events, procedural instructions, and ephemeral task state — all four need different lifecycles but collapse into one blob of bullets.
  • Duplicate memories on re-ingest because there's no deduplication hook.
  • A vector index polluted with short-lived items that shouldn't be in semantic retrieval.

A production memory service needs a pipeline of specialised stages, each answering one question.

The pattern

Sequence six stages. Each has one job; boundaries are typed.

messages
(1) content-addressed ID
    stable-identity-fields → hash → [:128 bits]
    → idempotent re-ingest via INSERT OR IGNORE
(2) extractor (two parallel passes)
    ├── full pass: chunk at ~10K chars, N-message overlap,
    │    K chunks concurrent; structured transcript with role
    │    labels + relative→absolute dates + line indices
    └── detail pass (long conversations only): overlapping windows
         for names / prices / version numbers / entity attributes
    → merge two result sets
(3) verifier (N checks per memory)
    entity identity / object identity / location / temporal /
    organisational / completeness / relational / supported-by-
    transcript → pass, correct, or drop
(4) classifier (one of N types)
    different types get different lifecycles:
    some keyed + vector-indexed (facts, instructions)
    some timestamped + vector-indexed (events)
    some ephemeral, FTS-only (tasks)
(5) storage write
    INSERT OR IGNORE by content-addressed ID
    supersession chain for keyed types (old → new forward pointer)
(6) return response to harness   ─────────────────────┐
                                                      │ background
(7) async vectorisation                               │ non-blocking
    embed(prepend(generated search queries) ⊕ content)
    upsert new vector; delete superseded-memory vector
    in parallel

The load-bearing properties:

  • Stage isolation. Verifier's job is "is this fact in the transcript?" — not extraction, not classification. Classifier's job is "what type is this and what's the topic key?" — not verification. Each stage can be iterated on independently.
  • Parallelism where safe. Two extraction passes run in parallel; multiple chunks within a pass run concurrently; vectorisation runs after the API has already returned to the harness.
  • Dedup via content addressing, not search. Stage 1 makes re-ingest free; Stage 5 leans on it with INSERT OR IGNORE.
  • Typed lifecycles. The classifier's output is not just a label — it drives different storage / indexing / retention behaviours downstream.
  • Write-time query synthesis. Stage 7 bridges the declarative-vs-interrogative asymmetry by prepending anticipated questions to the embedding text.

Canonical wiki instance: Cloudflare Agent Memory

Agent Memory ingest pipeline, realised exactly this shape:

  1. Deterministic ID generationSHA-256(sessionId + role + content)[:128 bits].
  2. Extractor with two parallel passes — full pass (~10K-char chunks, 2-message overlap, 4 concurrent) + detail pass (≥9-message conversations, overlapping windows for concrete values).
  3. Verifier with 8 checks — entity identity, object identity, location context, temporal accuracy, organizational context, completeness, relational context, whether inferred facts are supported by the conversation.
  4. Classifier into 4 types — facts (keyed, atomic, stable), events (timestamped), instructions (keyed, procedural), tasks (ephemeral, FTS-only).
  5. Storage via INSERT OR IGNORE + supersession chains for facts and instructions (forward pointer old → new).
  6. Response returned to harness.
  7. Background vectorisation — embedding text prepends the 3-5 search queries generated during classification; superseded-memory vectors deleted in parallel with new upserts.

"The first step is deterministic ID generation. Each message gets a content-addressed ID. If the same conversation is ingested twice, every message resolves to the same ID, making re-ingestion idempotent."

"Next, the extractor runs two passes in parallel."

"The next step is to verify each extracted memory against the source transcript. The verifier runs eight checks…"

"The pipeline then classifies each verified memory into one of four types."

"Finally, everything is written to storage using INSERT OR IGNORE so that content-addressed duplicates are silently skipped. After returning a response to the harness, background vectorization runs asynchronously."

— (Cloudflare, 2026-04-17)

Why parallel passes + verifier-after-extraction

  • Full + detail passes address opposite failure modes. A full pass over long chunks generalises concrete values away ("the user made a choice" instead of "the user chose pnpm"); a narrow detail pass with overlapping windows catches names / prices / versions the broad pass skips. Running both in parallel and merging costs ~2× LLM token spend on extraction but approximately doubles the useful-fact count.
  • Verification after extraction, not merged into it. A combined "extract and verify" prompt tends to verify-as-you-extract, missing the catching-up-on-the-transcript work the dedicated verifier does. Running verification as a separate stage makes each check explicit and auditable.

Trade-offs

Dimension Impact
Latency Multi-stage = multi-LLM-call; partially mitigated by per-stage parallelism + async vectorisation
Cost Extraction doubled (two passes) + verifier + classifier LLM calls per memory batch
Quality Significantly higher than single-prompt summary; each stage is tunable
Iterability Each stage can be measured + improved independently (canonical instance of patterns/agent-driven-benchmark-loop)
Failure isolation A bad extraction stage doesn't poison storage — verifier drops

Seen in

Last updated · 200 distilled / 1,178 read