CONCEPT Cited by 1 source
Prompt optimisation feedback loop¶
A prompt optimisation feedback loop is a production pattern in which LLM prompts are continuously edited based on real production outputs, with domain-specific examples promoted into the prompt's few-shot / multi-shot context as the system observes which shots actually correlate with high-accuracy outputs. Accuracy compounds over time as the prompt accumulates well-chosen examples; the loop is not model fine-tuning — it operates entirely at the prompt layer.
Canonicalised on the wiki via AArete's Doczy.ai disclosure (sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws):
"Through few-shot and multi-shot prompting, the platform continuously edits the prompt on domain-specific examples and based on real outputs, creating a feedback loop that compounds accuracy improvements over time."
Structural pieces¶
- Per-class prompt template. The pipeline detects each document's file class first; a prompt template is selected per class (rather than one global prompt for all documents).
- Few-shot / multi-shot examples in the prompt. The prompt carries domain-specific examples — labelled input/output pairs that demonstrate the desired extraction behaviour for that file class.
- Production output observation. Real LLM outputs are scored (against gold labels, against downstream-system feedback, against human review on sampled cases, or against LLM-judge evaluation).
- Continuous prompt editing. Examples that correlate with high-accuracy outputs are promoted into the prompt; examples that correlate with low-accuracy outputs are removed or replaced; new edge cases observed in production are added as new shots.
- Compounding accuracy. Each iteration leaves the prompt at least as good as before — the loop is monotonic in expected value (subject to the noise of any single iteration).
Sibling concepts on the wiki¶
| Concept | Source | What's optimised | How |
|---|---|---|---|
| Prompt optimisation feedback loop | Doczy.ai (2026-06-02) | Per-class prompt templates | Continuous edit on production outputs |
| concepts/agent-self-correction-loop | Databricks Genie (2026-05-08) | Per-trajectory agent decisions | Self-correction within a single run |
| systems/gepa-prompt-optimizer | Databricks Genie (2026-05-08) | Prompt at training time | Genetic / evolutionary prompt search |
| concepts/few-shot-prompt-template | Multiple sources | Static few-shot prompt | Set once, not edited |
| patterns/llm-judge-as-inline-pipeline-stage | Databricks Unlocking Archives (2026-05-11) | Output quality control | LLM evaluates LLM at inference time |
This concept is distinct from agent self-correction — Doczy.ai's loop happens between extraction runs (the prompt observed by run N+1 differs from run N), whereas an agent self-correction loop happens within a single run (the agent revises its own answer mid-trajectory).
It's distinct from GEPA in that GEPA is a training-time prompt-search algorithm; Doczy.ai's loop runs continuously in production and edits prompts based on production observations, not on a held-out training set.
It's distinct from fine-tuning in that the model weights never change; only the prompt context does.
What "compounds over time" means¶
The loop is monotonic in expectation because each iteration is gated by an evaluation step:
- A new candidate prompt is only promoted into production if it scores at least as well as the current prompt on the most recent eval window.
- The prompt's example set is append-and-replace, not replace-only — well-performing examples are retained across iterations.
- Production traffic continually surfaces new edge cases that weren't in the training set; the loop captures them as new shots.
The verbatim disclosure ("compounds accuracy improvements over time") suggests a long time horizon — Doczy.ai's 22-month production envelope had time for many such iterations.
Required substrate¶
- File-class detection — without per-class routing, a single global prompt averages over all classes and the loop can't make per-class progress.
- Production output capture — the loop needs to see real outputs. In Doczy.ai's case the Snowflake structured-data sink and the dashboards built over it provide the observation surface.
- Evaluation signal — gold-label, downstream-feedback, human-review, or LLM-judge.
- Domain expertise — "AArete's team of experts will configure this solution" — the examples themselves come from domain-knowledgeable humans curating the prompt's example bank.
When to apply¶
- Document-extraction pipelines where the document class space is finite and well-bounded (contracts, claims, regulatory filings).
- Production scale where the loop has enough iterations to reach a high-accuracy regime.
- Domains with available evaluation signal (gold labels, feedback from downstream systems, expert review).
When not to apply¶
- Open-ended generation tasks where there's no objective accuracy signal.
- Pipelines without clear file classes (the loop has nothing to segment by).
- Cases where model fine-tuning is more cost-effective than prompt-layer iteration.
Risks¶
- Prompt drift. As the prompt accumulates examples it can over-fit to recent traffic; periodic regression-test discipline against historical eval sets is needed.
- Reward hacking. If the eval signal is biased (e.g. optimising for whatever the LLM judge prefers), the loop will converge to that bias.
- Example library size limits. Prompts have token-budget ceilings; the loop must select among examples, not just accumulate.
Caveats¶
The Doczy.ai disclosure does not describe how the prompt-edit mechanism works in detail (manual curation by experts vs automated selection vs hybrid; how the system decides which shots to swap in/out; what the eval signal is per file class). The wiki captures the disclosed shape of the pattern; mechanism details are AArete IP.
Seen in¶
- sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws — canonical wiki disclosure of the term and concept; positioned as one of the three load-bearing algorithmic primitives behind Doczy.ai's 99% accuracy alongside concepts/smart-chunking and concepts/dual-clustering-document-intelligence.
Related¶
- concepts/file-class-routing — required substrate
- concepts/few-shot-prompt-template — static-prompt sibling
- concepts/agent-self-correction-loop — within-run sibling
- systems/gepa-prompt-optimizer — training-time sibling
- patterns/llm-judge-as-inline-pipeline-stage — eval-signal source
- systems/doczy-ai
- patterns/managed-ai-document-intelligence-pipeline-on-aws