Skip to content

PATTERN Cited by 1 source

AST codegen for boilerplate shim

Intent

When building a shim layer over a large library with many classes / structs / enums / constants, use abstract-syntax-tree parsing of the library's headers to auto-generate the boilerplate adapter, converter, and unit-test scaffolding, leaving engineers to hand-tune only the complex cases (factory patterns, static methods, raw pointer semantics, ownership transfers).

Motivation

Hand-writing adapters for a library the size of libwebrtc — dozens of APIs, "a large number of objects to shim", each requiring:

  • An abstract API definition
  • Directional adapters (internal → external and vice versa)
  • Converters for structs / enums
  • Unit tests

…is a multi-engineer-month task that becomes the bottleneck of any shim-based migration. Most of the work is syntactic transformation of the library's header declarations — exactly what AST-based code generation does best.

Structure

  1. Parse the library's headers with an AST parser (e.g. libclang or a similar C++ AST library). Extract class declarations, struct layouts, enum values, constant definitions, method signatures, parameter types.
  2. Emit baseline shim code for each extracted entity:
    • Abstract API definitions in the shim namespace.
    • Directional adapter class templates that forward each method to the underlying object's flavor-specific namespace.
    • Converter functions for each struct / enum pair.
    • Unit test scaffolding exercising each adapter + converter.
  3. Hand-finish the complex cases. The generator handles symmetric APIs near-zero-touch. Asymmetric APIs (factories that return opaque handles, static methods that require a class-level flavor dispatch, raw pointers with non-obvious ownership semantics, move-only types) get engineer attention on top of the generated baseline.
  4. Re-run the generator on each upstream release. The script is a long-lived asset; new API surface in a new upstream version gets shimmed automatically, keeping the per-release cost of upgrades low.

Canonical instance: Meta × WebRTC shim (2026)

"The shim layer itself required adapters and converters. With a large number of objects to shim across dozens of APIs — each requiring an abstract API definition, adapter and converters implementations, and unit tests — the estimated manual effort was huge! We turned to automation. Using abstract syntax tree (AST) parsing, we built a code generation system that produces baseline shim code for classes, structs, enums, and constants. The generated code is fully unit-tested and easy to extend. This increased our velocity from one shim per day to three or four per day while reducing the risk of human error. For simple shims where the API is identical across versions, the generated code required close to zero manual intervention. For more complex cases — API discrepancies between versions, factory patterns, static methods, raw pointer semantics, and object ownership transfers — engineers refined the generated baseline." (sources/2026-04-09-meta-escaping-the-fork-webrtc-modernization)

Velocity: hand-written 1 shim/day → 3–4 shims/day after codegen adoption. 3–4× multiplier on the bottleneck task of the whole WebRTC migration.

Consequences

  • Shim codegen becomes a long-lived asset. Every new upstream release runs through the generator; the investment amortizes across upgrades.
  • Template quality bounds output quality. When the generator's template for e.g. "method returning unique_ptr" is wrong, every generated shim inherits the bug. Template authoring deserves careful review.
  • Asymmetric-API work doesn't go away — it becomes a smaller, focused part of the shim-construction work instead of being mixed with syntactic boilerplate.
  • Generated code must be reviewable. Meta's generator produces unit-tested code; the discipline of asserting the generated output compiles, passes tests, and is reviewable is what keeps it safe to ship.

Relationship to AI-assisted code generation

This pattern predates and is distinct from LLM-based code generation. The generator is deterministic: given the same input headers it produces the same output. That determinism is what lets it be part of a production build. LLMs may eventually replace or augment the template engine (especially for the complex cases the current templates don't handle), but the architectural separation — codegen on the boilerplate, engineers on the asymmetric cases — is invariant.

Seen in

Last updated · 319 distilled / 1,201 read