PATTERN Cited by 1 source

AST codegen for boilerplate shim¶

Intent¶

When building a shim layer over a large library with many classes / structs / enums / constants, use abstract-syntax-tree parsing of the library's headers to auto-generate the boilerplate adapter, converter, and unit-test scaffolding, leaving engineers to hand-tune only the complex cases (factory patterns, static methods, raw pointer semantics, ownership transfers).

Motivation¶

Hand-writing adapters for a library the size of libwebrtc — dozens of APIs, "a large number of objects to shim", each requiring:

An abstract API definition
Directional adapters (internal → external and vice versa)
Converters for structs / enums
Unit tests

…is a multi-engineer-month task that becomes the bottleneck of any shim-based migration. Most of the work is syntactic transformation of the library's header declarations — exactly what AST-based code generation does best.

Structure¶

Parse the library's headers with an AST parser (e.g. libclang or a similar C++ AST library). Extract class declarations, struct layouts, enum values, constant definitions, method signatures, parameter types.
Emit baseline shim code for each extracted entity:
- Abstract API definitions in the shim namespace.
- Directional adapter class templates that forward each method to the underlying object's flavor-specific namespace.
- Converter functions for each struct / enum pair.
- Unit test scaffolding exercising each adapter + converter.
Hand-finish the complex cases. The generator handles symmetric APIs near-zero-touch. Asymmetric APIs (factories that return opaque handles, static methods that require a class-level flavor dispatch, raw pointers with non-obvious ownership semantics, move-only types) get engineer attention on top of the generated baseline.
Re-run the generator on each upstream release. The script is a long-lived asset; new API surface in a new upstream version gets shimmed automatically, keeping the per-release cost of upgrades low.

Canonical instance: Meta × WebRTC shim (2026)¶

"The shim layer itself required adapters and converters. With a large number of objects to shim across dozens of APIs — each requiring an abstract API definition, adapter and converters implementations, and unit tests — the estimated manual effort was huge! We turned to automation. Using abstract syntax tree (AST) parsing, we built a code generation system that produces baseline shim code for classes, structs, enums, and constants. The generated code is fully unit-tested and easy to extend. This increased our velocity from one shim per day to three or four per day while reducing the risk of human error. For simple shims where the API is identical across versions, the generated code required close to zero manual intervention. For more complex cases — API discrepancies between versions, factory patterns, static methods, raw pointer semantics, and object ownership transfers — engineers refined the generated baseline." (sources/2026-04-09-meta-escaping-the-fork-webrtc-modernization)

Velocity: hand-written 1 shim/day → 3–4 shims/day after codegen adoption. 3–4× multiplier on the bottleneck task of the whole WebRTC migration.

Consequences¶

Shim codegen becomes a long-lived asset. Every new upstream release runs through the generator; the investment amortizes across upgrades.
Template quality bounds output quality. When the generator's template for e.g. "method returning unique_ptr" is wrong, every generated shim inherits the bug. Template authoring deserves careful review.
Asymmetric-API work doesn't go away — it becomes a smaller, focused part of the shim-construction work instead of being mixed with syntactic boilerplate.
Generated code must be reviewable. Meta's generator produces unit-tested code; the discipline of asserting the generated output compiles, passes tests, and is reviewable is what keeps it safe to ship.

Relationship to AI-assisted code generation¶

This pattern predates and is distinct from LLM-based code generation. The generator is deterministic: given the same input headers it produces the same output. That determinism is what lets it be part of a production build. LLMs may eventually replace or augment the template engine (especially for the complex cases the current templates don't handle), but the architectural separation — codegen on the boilerplate, engineers on the asymmetric cases — is invariant.

Seen in¶

sources/2026-04-09-meta-escaping-the-fork-webrtc-modernization — canonical wiki instance. The AST codegen is load-bearing in Meta's WebRTC shim — the 1-to-3-or-4-shim-per-day velocity lift is cited as the reason the migration across 50+ use cases was tractable.

concepts/abstract-syntax-tree — the underlying IR.
concepts/shim-layer — the artefact being generated.
concepts/symbol-renamespacing — sibling automation on the same codebase.
patterns/shim-for-dual-stack-ab-testing — the broader migration pattern this enables.