PATTERN Cited by 2 sources

AST + LLM hybrid conversion¶

Pattern¶

Compose a deterministic AST codemod and a large language model into a single code- conversion pipeline, where:

The AST pass resolves every case it can handle with rule-based transformations.
For cases it can't fully resolve, the AST pass writes in-code annotation comments into the partially-converted source — pointers at the call site, suggested replacements, links to relevant docs.
The LLM receives the original file, the partially- converted file with AST-authored annotations, any runtime context relevant to the target framework (e.g. rendered DOM, sample data, recorded API traces), and a structured prompt.
The LLM finishes the conversion — reading both the original intent and the AST's hints, rendering the final file in a well-defined output format (typically wrapped in delimiters like <code></code>).
Downstream, deterministic validators (does it parse? do tests pass? does the test count match the original?) bucket the LLM output by pass-rate for human triage.

Forces¶

LLM alone on a deterministic code-transformation task hallucinates — 40-60% success rate at Slack's Enzyme→RTL scale with Anthropic Claude 2.1, with wild variance by task complexity (Source: sources/2024-06-19-slack-ai-powered-conversion-from-enzyme-to-react-testing-library).
AST alone has a ceiling wherever the correct transformation depends on runtime context the AST cannot see (e.g. rendered DOM for RTL queries, resolved schema for SQL migrations). Slack's AST-only pass topped out at ~45%.
Prompt engineering alone hits diminishing returns fast: "our attempts to refine prompts had limited success… possibly perplexing the AI model rather than aiding it." Structural scaffolding outperforms prompt micro-optimisation.
At migration scale (15,000+ tests / 10,000+ engineering hours for Slack), manual conversion is infeasible. Automation has to ship something better than 45%, ideally close to developer-quality.

Mechanism¶

The pattern composes three independently-valid primitives:

AST pre-pass as both conversion layer and hallucination-control layer. Every case it resolves is one less case the LLM can get wrong; every annotation it writes is a structural constraint on the LLM's decoding path.
Runtime-artifact context injection — capture whatever runtime information the target framework depends on (DOM, schema, recorded trace) and inject it into the prompt. This eliminates an entire class of hallucination (guessing about runtime state).
Structured prompt template — three-part (context / tasks / self-evaluate) with explicit delimiters that make the output machine-extractable.

The result — per Slack — is a 20-30% quality lift over pure- LLM prompting: from 40-60% baseline to ~80% on evaluation files.

Production instantiation: Slack Enzyme-to-RTL¶

From systems/enzyme-to-rtl-codemod:

AST pass handles top-10 Enzyme methods (find, prop, simulate, text, update, instance, props, hostNodes, exists, first), custom Jest matchers, query-selector rewrites. For the remaining 55 methods and context-dependent cases, it writes in-code annotation comments with suggestions and doc links.
DOM collection instruments Enzyme's mount and shallow methods to capture per-test-case wrapper.html() keyed by expect.getState().currentTestName. Output is appended to a file consumed by the LLM prompt.
LLM request wraps the original test file in <code></code>, the AST-partial in <codemod></codemod>, each captured DOM in <component><test_case_title>...</test_case_title> and <dom_tree>...</dom_tree></component>, plus a three-part structured prompt.
LLM is Anthropic Claude 2.1 (2024 era). Output wrapped in <code></code> tags.
Downstream validator runs the converted tests, buckets output by pass-rate (fully / 50-99% / 20-49% / <20%) for triage.
Operational envelope: 2-5 min per file on-demand; CI-nightly over hundreds of files; ~64% adoption across Slack's RTL migration.

Consequences¶

Positive:

Hybrid dominates either alone: AST-only ~45%, LLM-only 40-60%, hybrid ~80% on Slack's evaluation set.
Debuggable: when the output is wrong, you can tell which stage failed — AST annotations missing / wrong, DOM capture missing, or LLM hallucinated. Each layer is individually auditable.
Improvable incrementally: new AST rules reduce the surface the LLM touches, improving quality monotonically.
Generalises: test migration is one instantiation. Same pattern applies to API migrations, framework upgrades, language ports, SQL dialect translation, config migration.

Negative:

Three systems to build and maintain — AST codemod, DOM collector (or equivalent runtime instrumentor), LLM pipeline + prompt.
LLM still hallucinates at the residual ~20%. Human verification remains mandatory; Slack: "the generated code was manually verified by humans before merging into our main repository".
Quality ceiling is model-dependent — Slack's 80% was Claude-2.1-era; modern frontier models likely hit the ceiling higher, but the 80% number isn't portable as a universal benchmark.

Contrast: LLM-only (Zalando, 2025-02)¶

Zalando's Component Migration Toolkit tackles a structurally-similar problem (migrate 15 B2B apps between two in-house UI component libraries) without an AST pre-pass — see patterns/llm-only-code-migration-pipeline. Its hallucination- control mechanism is a pre-computed, human-verified Interface + Mapping + Examples prompt composition rather than an AST-authored annotation pass. Both approaches encode the same load-bearing information (visual equivalence, source-target attribute mapping) into the pipeline; they differ on the application layer — AST execution (deterministic, fast, brittle on edge cases) vs LLM inference (probabilistic, slower, robust on edge cases). Industry as of 2024-2025 shows both patterns in production; the right choice depends on whether AST-rule-authoring cost per edge case exceeds the running cost of LLM inference per file, and on whether the target transformation is close-to-deterministic (favours AST) or deeply context-dependent (favours LLM).

concepts/abstract-syntax-tree — the deterministic pre-pass primitive
concepts/llm-hallucination — the failure mode this pattern mitigates
concepts/llm-conversion-hallucination-control — the structural problem class
concepts/dom-context-injection-for-llm — runtime-artifact context injection
patterns/in-code-annotation-as-llm-guidance — the mechanism by which the AST pass shapes LLM output
patterns/llm-plus-planner-validation — sibling pattern: LLM output fed into a deterministic validator (PostgreSQL AI index suggestions)
patterns/llm-output-as-untrusted-input — downstream consequence: validate before merge
systems/enzyme-to-rtl-codemod — canonical production instantiation