SYSTEM Cited by 1 source
Zalando Component Migration Toolkit¶
Definition¶
Zalando's Component Migration Toolkit is a Python-based
internal tool the Zalando Partner Tech department built in
September 2024 to migrate 15 B2B applications from one
in-house UI component library to another. Wraps the
Datasette llm Python library's
conversation API
around GPT-4o with a structured system
prompt that enforces the <updatedContent> output-fencing
contract, temperature=0 for determinism, and a
static-prefix-plus-dynamic-suffix prompt layout engineered
for prompt-cache hits.
Architecture (as disclosed)¶
- Input: a source directory path. The tool walks files under that root and, for each file, invokes the migration request.
- Static prefix (cacheable): the system prompt plus, for each component in the current logical group (form / core / …), the component interface, transformation mapping, and migration examples.
- Dynamic suffix: the
<file>{file_content}</file>block specific to the file being transformed. - Model: GPT-4o at temperature=0.
- Output contract: LLM must return the transformed file
inside
<updatedContent>…</updatedContent>with no other text; a downstream parser strips the fence and writes the file. - Truncation recovery: if the response hits the 4K
output-token limit, the toolkit sends the literal prompt
"continue"on the same conversation and concatenates the completion. "A simple 'continue' prompt proved more reliable than more complex prompts to continue the transformation" (Source: sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries). - Context budget discipline: components are partitioned into logical groups ("form, core, etc.") sized so that one group's prompt stays at 40–50K context tokens — empirically the accuracy-sweet-spot. The tool is run once per group per file, not once per entire library.
- Prompt-regression tests: an LLM-generated example library (the same examples that ride in the prompt) is replayed through the pipeline in CI; any divergence from the golden output signals prompt drift.
- Authoring workflow: prompts were assembled via continue.dev — its IDE integration automates attaching source files to a prompt context, which "improved our workflow" over manual copy-paste.
Operational envelope¶
- Processing time: 30–200 s per file.
- Cost: < $40 per repository rough estimate under GPT-4o pricing, averaging 45K prompt / 2K output tokens per file, ~10 component groups × ~3 components × ~30 files per group. Prompt caching further reduces this.
- Accuracy: ~90% overall across low/medium/high complexity; "even higher accuracy for components of low to medium complexity."
- Scope: 15 sophisticated B2B applications across the Partner Tech department.
Why LLM-only, not AST+LLM¶
Zalando's iteration-1 experiment handed raw source code to the LLM and failed — "multiple complex intermediary steps" in a single pass. Rather than inserting an AST codemod layer (the Slack Enzyme→RTL AST+LLM hybrid route), Zalando pre-computed the interface and mapping offline through four more experiment rounds and froze them into the prompt. The human effort (mapping verification by pair programmers + designers) that would have gone into AST rule authoring went into visual-equivalence verification instead. See concepts/visual-equivalence-mapping for why: the LLM "couldn't visualize how components are rendered", and design-intent information isn't recoverable from source code alone.
The toolkit is therefore the canonical wiki instance of the LLM-only code- migration pipeline pattern — an alternative to the AST+LLM hybrid for organisations where the source-target delta is small enough to encode as text (interface + mapping + examples) but large enough that codemods are expensive to author per edge case.
Tradeoffs / limits¶
- No visual-equivalence verification. The toolkit migrates the code but cannot verify the rendered output. Grid-column ratio mismatches (12 vs 24) produce syntactically-correct, visually-broken pages; human review catches these.
- "Moody" residual variance. Temperature=0 gives reproducibility for a fixed input, but "LLM tools occasionally produced inconsistent outputs. These issues appeared without any clear reason, sometimes simply by rerunning the same prompt on the same file at a different time" — provider-side non-determinism that the toolkit can't control.
- Processing-time discourages rapid iteration. 30–200 s per file makes "conducting quick, small-scale experiments more challenging", which is why the LLM-generated example library is replayed in CI rather than re-run per prompt change.
- GPT-4o is the only disclosed backend (September 2024 timeframe). No multi-provider abstraction described.
- Test migration is partially outside scope. "Difficulties in migrating test suites due to inconsistent practices" was named as a project-specific challenge requiring manual work.
Seen in¶
- sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries — canonical (and only) wiki source. Migration executed through 2024-09 onward.
Related¶
- systems/gpt-4o — the LLM backend
- systems/continue-dev — prompt-authoring IDE integration
- systems/python-llm-library — Datasette's
llmlibrary (the OpenAI Chat Completions wrapper + conversation API) - concepts/prompt-interface-mapping-examples-composition — the three-layer prompt structure
- concepts/logical-component-grouping-for-context-budget — the 40–50K-token-per-group discipline
- concepts/continue-prompt-for-truncated-output — 4K token-limit recovery primitive
- concepts/temperature-zero-for-deterministic-codegen — reproducibility lever
- concepts/static-dynamic-prompt-partitioning — prompt- cache-aware layout
- concepts/llm-generated-prompt-regression-test — CI discipline
- concepts/opaque-output-format-fencing —
<updatedContent>extraction contract - patterns/llm-only-code-migration-pipeline — the canonical pattern this toolkit instantiates
- patterns/system-prompt-role-plus-output-format-constraint
- patterns/prompt-cache-aware-static-dynamic-ordering
- patterns/grouped-component-batched-migration
- companies/zalando