CONCEPT Cited by 1 source

Map-fold LLM pipeline¶

Definition¶

A map-fold LLM pipeline is a functional-composition pattern for processing a large document corpus through LLMs: a map phase applies a per-document LLM invocation to extract relevant information independently, then a fold (a.k.a. reduce) phase aggregates those per-document outputs into a higher-level synthesis — either via another LLM invocation or via a deterministic function (Source: sources/2025-09-24-zalando-dead-ends-or-data-goldmines-ai-powered-postmortem-analysis).

Zalando's framing, verbatim:

"A functional pattern 'map-fold' is a key building block for the pipeline. A large set of documents is independently processed using a language model to extract relevant information (the 'map' phase). These outputs are then aggregated either by another LLM invocation or a deterministic function into a higher-level summary (the 'reduce' or 'fold' phase). This modular design supports composable tasks like summarization, classification, or knowledge extraction."

Why it's a useful primitive¶

The pattern inverts the typical single-context-prompt shape for LLM-over-many-documents workloads:

Map step is embarrassingly parallel. Each document is processed in isolation — no ordering dependencies, no shared context window.
Fold step sees a compressed input. The fold aggregates extraction outputs (e.g. 3–5-sentence digests at Zalando), not raw documents. Even at thousands-of-documents scale, the aggregated digest corpus fits comfortably in a single context window.
Each phase has a bounded, single-objective prompt. This dodges the lost in the middle failure mode of packing many documents into one prompt.
The fold function is interchangeable. Another LLM invocation (for narrative synthesis) or a deterministic function (for enumerating, bucketing, joining) — the shape is the same.

Where it sits in the LLM pipeline taxonomy¶

Map-fold is one of several canonical LLM-pipeline composition primitives on the wiki:

Map-fold (this page) — extraction + aggregation over a large corpus. Canonical instance: Zalando's postmortem analysis pipeline.
Pipeline / stage chain — sequential per-document transformations where each stage's output feeds the next. Zalando's Summarization → Classification → Analyzer sub-chain is this shape before the fold.
Agent loop — model generates tool calls, observes results, re-generates. Zalando explicitly rejected this for postmortem analysis: "The initial concept of a no-code agentic solution was quickly deemed unfeasible."
Planner-coder-verifier-router loop — the plan-refinement shape; see patterns/planner-coder-verifier-router-loop.

Map-fold's discriminator is corpus-scale extraction + pooled synthesis: it's the right shape when you want a model to reason about what all these documents collectively tell you, not what each document individually says.

Zalando's pipeline as canonical map-fold¶

Phase	Stage(s)	Notes
Map	Summarization → Classification → Analyzer	Per-document; each stage is itself narrow single-objective.
Fold	Patterns (LLM fold) → Opportunity (human)	LLM folds all digests to one-pager; human folds pattern to ROI.

The fold happens twice: once at the Patterns stage (LLM consolidates thousands of digests into one recurring-pattern report), and again at the Opportunity stage (human analyst converts pattern report + incident-database numerics into an investment proposal). The second fold is deterministic- human, not LLM — consistent with the map-fold framing that "aggregated either by another LLM invocation or a deterministic function."

Why not just MapReduce¶

Map-fold is intentionally named after functional map/fold rather than Google's MapReduce because:

No shuffle phase. MapReduce's distinguishing feature is key-based shuffle between map and reduce; the map-fold pattern as used for LLM pipelines doesn't shuffle — all per-document outputs feed one fold stage.
Order may matter at fold. Some LLM fold prompts are sensitive to the order digests are listed in (recency bias, position effects). MapReduce assumes reduce is commutative / associative.
Typical cardinality is far smaller than MapReduce's. Zalando's pipeline operates over thousands of documents, not billions of records.

The pattern is functional (compose higher-order operations over a sequence) more than distributed (process a sequence across many nodes).

Tradeoffs / gotchas¶

Map-stage attribution loss. Once the map stage produces compressed outputs (5-field summaries, 3–5-sentence digests), the fold stage can't recover information dropped at map. Zalando's mitigation: human curation of digests before fold — "the pivotal role of digests allowed humans to observe all incidents as a whole and precisely validate and curate the reports produced by LLMs."
Fold-stage surface-attribution risk. The LLM at fold can still commit surface attribution errors — recurring patterns that aren't actually there in the underlying digests but pattern-match against frequent-keyword digests. Human proofreading of the fold output remains a required gate even when per-digest quality is high.
Fold-stage context limit is the real ceiling. The fold LLM still has to hold all digests in its context. At ~5 sentences per digest × thousands of digests, this can approach frontier-model context limits. Zalando don't disclose their fold-input size; the pipeline currently uses Claude Sonnet 4 (∼200K-token context) which comfortably fits N ≤ ~10K digests.
Hallucination compounds. If the map stage has a 10% error rate and the fold has a 10% error rate, the end-to-end error rate isn't 10% — it's a combined distribution. Zalando's 100% → 10–20% human curation schedule is set against this compounding: early curation focuses on map-stage quality; late curation on the fold output.

Seen in¶

sources/2025-09-24-zalando-dead-ends-or-data-goldmines-ai-powered-postmortem-analysis — canonical wiki instance. Zalando explicitly names map-fold as the "key building block" of their postmortem-analysis pipeline, defines the map + fold phases, notes the modular composability for summarization / classification / knowledge extraction, and explicitly contrasts the approach with single-context large-window prompting.

patterns/multi-stage-llm-pipeline-over-large-context — the architectural pattern this primitive composes into.
concepts/lost-in-the-middle-effect — the failure mode that motivates bounded per-stage inputs the map-fold shape provides.
concepts/llm-hallucination — the cross-stage failure mode whose compounding the human-curation schedule addresses.
concepts/surface-attribution-error — the fold-stage risk specifically.
systems/zalando-postmortem-analysis-pipeline — canonical production instance.