Skip to content

PATTERN Cited by 1 source

Grouped component batched migration

Pattern

For LLM-driven migrations across many source-components, partition the components into small, semantically-coherent groups and run the migration once per group per file, where each group's prompt (interface + mapping + examples for the group's components only) stays inside the model's empirically-measured accuracy-sweet-spot band. Files that use components from multiple groups are visited multiple times by the tool, once per relevant group.

Forces

  • Accuracy declines with prompt length. Beyond the model's accuracy-sweet-spot band, "the input prompt size grew, the transformation accuracy declined" (sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries). See concepts/context-rot for the failure-mode taxonomy.
  • One big prompt with every component's mapping is expensive to cache and slow to prefill. A 200K-token prompt for 30 components imposes prefill cost on every call.
  • Semantic cohesion matters. Components likely to appear together in a file should be in the same group so one migration pass covers them.
  • Complete per-file coverage is necessary. If a file uses components from groups A and B, running only group A leaves B's components untransformed.

Mechanism

  1. Author a grouping. Zalando: "we organized components into logical groups (like 'form', 'core', etc.)". Grouping is by semantic domain, not alphabetic or size.
  2. Size groups to fit the context-budget sweet spot. For each group, the prompt prefix (interface + mapping + examples for all components in the group) must stay inside the accuracy band — 40–50K tokens at Zalando.
  3. For each file × each relevant group: run the LLM migration call. The toolkit visits the same file once per group whose components it contains. Most files touch 1–3 groups.
  4. Cache the prefix across calls in a group. Because every call within a group shares the same static prefix (see patterns/prompt-cache-aware-static-dynamic-ordering), batch all calls for one group in time to keep the cache warm.
  5. Parse the opaque fence, write the transformed file. Orthogonal component changes compose cleanly (different files, different call).

Heuristics

  1. Group by co-occurrence in files. Components that frequently appear together in the same file should be in the same group — otherwise you pay for multiple passes on every file that uses them.
  2. Size groups empirically. 40–50K tokens is Zalando's sweet spot for GPT-4o in 2024; measure your own.
  3. Small number of components per group preserves attention budget. Zalando's ~3 components per group on average is a useful starting point.
  4. Run groups contiguously. Cache lifetime is short; processing all of group A, then all of group B beats interleaving.
  5. Group order is not load-bearing (assuming orthogonal transformations). Pick an order that maximises cache reuse against the file corpus.

When it doesn't apply

  • Truly entangled components (one can't be transformed without knowing the other's transformation) must go in the same group, even if it pushes above the sweet spot.
  • Very small libraries where the full-library prompt fits comfortably inside the sweet spot — no need to group.
  • Transformations that require a file-global view (cross-component refactoring) don't compose across group visits; the pattern is file-local.

Consequences

Positive:

  • Accuracy preserved. Each call stays inside the model's sweet spot.
  • Cache-friendly. All calls in a group share a prefix; cache hits amortise prefill.
  • Incrementally extensible. A new component joins its group's prompt; existing groups are unaffected.
  • Debuggable per-group. When a migration quality regression shows up, the blame surface is one group's prompt, not the whole library's.

Negative:

  • Multi-group files cost multiple calls. A file using components from three groups pays three migration calls. In practice this is bounded (most files use 1–2 groups).
  • Grouping authorship is manual. The semantic partitioning is a human judgement call; no automatic algorithm for it.
  • Cross-group interactions aren't modelled. If transforming a <Form> component requires knowing how its <Input> children are being transformed, the pattern breaks down — the entangled components need to share a group.

Canonical instance

Zalando's Component Migration Toolkit: ~10 groups, ~3 components per group, 40–50K tokens per group prompt, ~30 files per group. Applied across 15 B2B applications.

Seen in

Last updated · 501 distilled / 1,218 read