PATTERN Cited by 1 source
Multi-stage extraction pipeline¶
Problem¶
Converting a raw conversation transcript into durable, searchable memories is not a single LLM call. A naive "summarise this to a list of facts" prompt produces:
- Hallucinated facts not in the transcript.
- Generalisations that lose load-bearing specifics (names, prices, version numbers).
- No distinction between atomic facts, time-bounded events, procedural instructions, and ephemeral task state — all four need different lifecycles but collapse into one blob of bullets.
- Duplicate memories on re-ingest because there's no deduplication hook.
- A vector index polluted with short-lived items that shouldn't be in semantic retrieval.
A production memory service needs a pipeline of specialised stages, each answering one question.
The pattern¶
Sequence six stages. Each has one job; boundaries are typed.
messages
▼
(1) content-addressed ID
stable-identity-fields → hash → [:128 bits]
→ idempotent re-ingest via INSERT OR IGNORE
▼
(2) extractor (two parallel passes)
├── full pass: chunk at ~10K chars, N-message overlap,
│ K chunks concurrent; structured transcript with role
│ labels + relative→absolute dates + line indices
└── detail pass (long conversations only): overlapping windows
for names / prices / version numbers / entity attributes
→ merge two result sets
▼
(3) verifier (N checks per memory)
entity identity / object identity / location / temporal /
organisational / completeness / relational / supported-by-
transcript → pass, correct, or drop
▼
(4) classifier (one of N types)
different types get different lifecycles:
some keyed + vector-indexed (facts, instructions)
some timestamped + vector-indexed (events)
some ephemeral, FTS-only (tasks)
▼
(5) storage write
INSERT OR IGNORE by content-addressed ID
supersession chain for keyed types (old → new forward pointer)
▼
(6) return response to harness ─────────────────────┐
│ background
(7) async vectorisation │ non-blocking
embed(prepend(generated search queries) ⊕ content)
upsert new vector; delete superseded-memory vector
in parallel
The load-bearing properties:
- Stage isolation. Verifier's job is "is this fact in the transcript?" — not extraction, not classification. Classifier's job is "what type is this and what's the topic key?" — not verification. Each stage can be iterated on independently.
- Parallelism where safe. Two extraction passes run in parallel; multiple chunks within a pass run concurrently; vectorisation runs after the API has already returned to the harness.
- Dedup via content addressing, not search. Stage 1 makes re-ingest
free; Stage 5 leans on it with
INSERT OR IGNORE. - Typed lifecycles. The classifier's output is not just a label — it drives different storage / indexing / retention behaviours downstream.
- Write-time query synthesis. Stage 7 bridges the declarative-vs-interrogative asymmetry by prepending anticipated questions to the embedding text.
Canonical wiki instance: Cloudflare Agent Memory¶
Agent Memory ingest pipeline, realised exactly this shape:
- Deterministic ID generation —
SHA-256(sessionId + role + content)[:128 bits]. - Extractor with two parallel passes — full pass (~10K-char chunks, 2-message overlap, 4 concurrent) + detail pass (≥9-message conversations, overlapping windows for concrete values).
- Verifier with 8 checks — entity identity, object identity, location context, temporal accuracy, organizational context, completeness, relational context, whether inferred facts are supported by the conversation.
- Classifier into 4 types — facts (keyed, atomic, stable), events (timestamped), instructions (keyed, procedural), tasks (ephemeral, FTS-only).
- Storage via
INSERT OR IGNORE+ supersession chains for facts and instructions (forward pointer old → new). - Response returned to harness.
- Background vectorisation — embedding text prepends the 3-5 search queries generated during classification; superseded-memory vectors deleted in parallel with new upserts.
"The first step is deterministic ID generation. Each message gets a content-addressed ID. If the same conversation is ingested twice, every message resolves to the same ID, making re-ingestion idempotent."
"Next, the extractor runs two passes in parallel."
"The next step is to verify each extracted memory against the source transcript. The verifier runs eight checks…"
"The pipeline then classifies each verified memory into one of four types."
"Finally, everything is written to storage using INSERT OR IGNORE so that content-addressed duplicates are silently skipped. After returning a response to the harness, background vectorization runs asynchronously."
Why parallel passes + verifier-after-extraction¶
- Full + detail passes address opposite failure modes. A full pass over long chunks generalises concrete values away ("the user made a choice" instead of "the user chose pnpm"); a narrow detail pass with overlapping windows catches names / prices / versions the broad pass skips. Running both in parallel and merging costs ~2× LLM token spend on extraction but approximately doubles the useful-fact count.
- Verification after extraction, not merged into it. A combined "extract and verify" prompt tends to verify-as-you-extract, missing the catching-up-on-the-transcript work the dedicated verifier does. Running verification as a separate stage makes each check explicit and auditable.
Trade-offs¶
| Dimension | Impact |
|---|---|
| Latency | Multi-stage = multi-LLM-call; partially mitigated by per-stage parallelism + async vectorisation |
| Cost | Extraction doubled (two passes) + verifier + classifier LLM calls per memory batch |
| Quality | Significantly higher than single-prompt summary; each stage is tunable |
| Iterability | Each stage can be measured + improved independently (canonical instance of patterns/agent-driven-benchmark-loop) |
| Failure isolation | A bad extraction stage doesn't poison storage — verifier drops |
Seen in¶
- sources/2026-04-17-cloudflare-agents-that-remember-introducing-agent-memory — canonical wiki instance; six-stage ingest pipeline with explicit parallel-passes + verifier + classifier + storage + async-vectorisation composition.
Related¶
- patterns/constrained-memory-api — the API this pipeline sits behind.
- patterns/parallel-retrieval-fusion — the read-side counterpart; symmetric multi-stage design on the retrieval pipeline.
- concepts/content-addressed-id — Stage 1's dedup primitive.
- concepts/memory-supersession — Stage 5's version-chain mechanism for keyed types.
- concepts/memory-compaction — the lifecycle moment the pipeline fires on.
- concepts/agent-memory — the storage substrate the pipeline writes into.
- systems/cloudflare-agent-memory — canonical realisation.