CONCEPT Cited by 1 source

Iterative prompt methodology¶

Definition¶

Iterative prompt methodology is the discipline of discovering a production prompt offline, through a sequence of discrete experiments, where each experiment evaluates one prompt structure against a held-out sample, identifies the failure mode, and the next experiment is human-authored to address that mode. The output is a frozen prompt shipped into production.

Distinct from concepts/iterative-prompt-refinement, where a judge LLM feeds failure signal into a generator LLM at inference time and every production call runs the loop.

The two timescales¶

	Iterative prompt methodology	Iterative prompt refinement
When it runs	Offline, during development	Online, per request
Author of the next iteration	Human	Judge LLM
Loop termination	Project delivery	Pass threshold or budget exhausted
Production artefact	Single frozen prompt	Closed-loop generator+judge
Example	Zalando UI-migration (5 experiments)	Instacart PIXEL (runtime loop)
Cost implication	One-time dev cost	Per-request multiple

Zalando's canonical five-experiment arc¶

From sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries:

Experiment 1 — source code only. Hand both source and target source code to the LLM, ask it to migrate. "This produced inconsistent results with numerous errors." Hypothesis: prompt needs the LLM to do too many intermediate steps in one pass.
Experiment 2 — interface only. Pre-generate a typed interface for each component, hand the interface + file to the LLM. Still low accuracy. "Even though the interface was detailed, it lacked essential information present in the original source code that was necessary for complete component transformation." Hypothesis: interface isn't specific enough; need explicit mapping.
Experiment 3 — interface + auto-mapping. Hand the interface plus an LLM-generated mapping (source attribute → target attribute). "The code was transformed with medium accuracy, but revealed flaws in the automated mapping instructions." Canonical failure: the size="medium" → size="medium" direct-name mapping when the visually-correct mapping is size="medium" → size="large". Hypothesis: mapping needs human verification.
Experiment 4 — interface + manually-verified mapping. Pair programmers + designers verify every attribute mapping against rendered outputs. "This improved accuracy even further for transforming basic components, but for complex components requiring substantial code restructuring it still had issues." Hypothesis: abstract rules aren't concrete enough; need worked examples.
Experiment 5 — + examples. Add worked input/output code samples with migration notes. "The code was transformed with a high degree of accuracy for all the components." Prompt structure frozen; productionised.

"Through this series of iterative experiments, we were able to finalize our approach."

Heuristics for methodology iteration¶

Stay small. Zalando's sample set was "a set of sample UI components of varying complexity from simple buttons and to more complex Select components" — tractable enough to evaluate by eye per round.
Single-variable iteration where possible. Each of Zalando's five iterations changes exactly one prompt layer. Changing two at once makes the attribution of a change in accuracy ambiguous.
Name the failure mode per round. Zalando's retrospective includes "why it failed" for every experiment. A hypothesis-driven iteration discovers structural prompt requirements faster than blind tuning.
Stop when accuracy is "high" on the sample set, not when it's perfect. Residual errors become the post-migration manual-review workload and the prompt-regression test fixtures; you don't need to squeeze them out in the methodology phase.

Migration-scale batch jobs with a small set of component shapes: offline iteration amortises over thousands of transformations without paying per-request judge-LLM cost.
Deterministic correctness criteria (does this compile? does this pass the test?): the runtime judge doesn't add signal beyond what the compile-error / test-failure provides.
Finite target domain (a specific pair of libraries, a specific transformation): the prompt can be tuned to the domain and shipped; no need for runtime adaptability.

Open-ended generation where each request has a different goal (image generation, translation of arbitrary text): no single frozen prompt can anticipate every input.
Quality judgement that requires a judge (visual appeal, cultural adaptation): compile-errors are insufficient; a VLM / LLM judge has to weigh in per request.

Seen in¶

sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries — canonical wiki instance. The five-experiment arc produced Zalando's Interface+Mapping+Examples composition.

concepts/iterative-prompt-refinement — the runtime-loop sibling concept
concepts/prompt-interface-mapping-examples-composition — the prompt structure this methodology converged on
patterns/llm-only-code-migration-pipeline — the pattern this methodology produces prompts for
patterns/prompt-iteration-as-offline-methodology-discovery — the pattern this concept anchors
companies/zalando

Iterative prompt methodology¶

Definition¶

The two timescales¶

Zalando's canonical five-experiment arc¶

Heuristics for methodology iteration¶

When methodology iteration beats runtime refinement¶

When runtime refinement beats methodology iteration¶

Seen in¶

Iterative prompt methodology¶

Definition¶

The two timescales¶

Zalando's canonical five-experiment arc¶

Heuristics for methodology iteration¶

When methodology iteration beats runtime refinement¶

When runtime refinement beats methodology iteration¶

Seen in¶

Related¶