CONCEPT Cited by 1 source

Bytecode vs AST static analysis¶

Definition¶

A structural choice in JVM static-analysis tool design: whether to operate on the Abstract Syntax Tree (AST) of source code (via a parser) or on the compiled bytecode (via a bytecode framework like ASM). The choice has cascading consequences for language coverage, sugar sensitivity, generated-code visibility, and what kinds of rules are expressible.

The wiki's first canonical framing is via Netflix's ArchUnit post, which articulates the three load-bearing differences.

The three differences¶

1. Language coverage¶

"Rules that need to support multiple JVM languages, such as Kotlin or Scala, often need to be rewritten for each language. … ArchUnit uses ASM to analyze actual compiled bytecode, which means it doesn't matter how that code was produced." — sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules

Approach	Coverage
Java AST	Java only
Kotlin AST	Kotlin only
Bytecode	Java + Kotlin + Scala + Groovy + Clojure + JRuby + everything else that compiles to JVM bytecode

For polyglot organizations, this is an N×M problem (N rules × M languages) on AST tools and an N×1 problem on bytecode tools.

2. Syntactic-sugar immunity¶

Source-level constructs that desugar to bytecode:

Java: lambdas → invokedynamic, records → constructor + accessors, var → explicit type, switch-expressions → goto.
Kotlin: extension functions → static methods, data classes → constructor + equals/hashCode/toString, inline functions → inlined-bytecode at call site, internal → mangled-name public, default arguments → synthetic methods with $default suffix.
Scala: implicits → method calls, case classes → similar to Kotlin data, traits → mixin classes.
Lombok: @Data / @Builder → generated bytecode for accessors / builder.
Annotation processors / KSP / kapt: source-level annotations → generated classes / methods.

"It also allows code which should be found to be hidden under syntactic sugar not anticipated by the rule author." — sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules

Bytecode-based analysis sees the desugared form. AST-based analysis sees the source form and may miss sugar-introduced violations.

3. Class-graph retention¶

"Because ArchUnit processes the entire classpath with ASM, it retains a graph of the class data, allowing rules to easily traverse class relationships and call sites. This allows rules to have much more context about the code it is evaluating." — sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules

Bytecode tools typically build a classpath graph at analysis time — every class, method, field, and reference is in memory and queryable. AST tools typically operate per-file — the AST is the file's syntax tree; cross-file references are follow-up resolution problems.

Cross-class rules require the class graph:

"No class in package A may call methods on classes in package B." → Need to walk call edges from A to B.
"All implementations of interface I must reside in package P." → Need to walk inheritance edges from I to all implementers.
"No @Deprecated method may be called from outside its declaring package." → Need call sites for the method.

These are awkward at best on per-file ASTs.

When AST analysis still wins¶

Scenario	Why AST wins
Source formatting rules	Bytecode loses indentation, comments, blank lines
Comment-based rules (e.g. "every public method must have a Javadoc")	Comments are erased at compile time
Local-only style rules	Class graph is overhead if you only need per-line / per-statement
Pre-compile-time analysis (IDE inspection on uncompiled file)	No bytecode exists yet
Cross-language source-level migration	Need source-level transformation, not bytecode-level analysis
Migration tooling (OpenRewrite)	Need to emit source, not bytecode

The Netflix post acknowledges "PMD has a Java rule API" in addition to XPath, so the dichotomy isn't absolute — PMD does have a typed-rule path. But the structural difference between bytecode analysis with class-graph retention and per-file AST analysis is a real architectural choice that determines which kinds of rules a tool can naturally express.

Concrete example: detecting all callers of a deprecated method¶

Goal: find every place LegacyThing.oldMethod() is called.

Approach	How
Bytecode (ArchUnit)	Walk class graph; find INVOKEVIRTUAL / INVOKESTATIC instructions targeting `LegacyThing.oldMethod`.
AST (PMD)	Per-file, walk method-call expressions; check name + receiver type. Missed cases: alias imports, wildcard imports, method references, lambdas, indirect calls.

The bytecode approach naturally finds every actual call site in the compiled output. The AST approach has to reconstruct what the compiler does to determine if a particular method-call expression resolves to the deprecated method.

Adjacent concepts¶

concepts/architectural-fitness-function — fitness functions benefit from class-graph retention, so are typically built on bytecode analyzers.
concepts/abstract-syntax-tree — the source-level representation AST tools operate on.

Seen in¶

sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules — Netflix names the bytecode-vs-AST tradeoff explicitly as the structural reason for choosing ArchUnit over PMD for fleet-wide rule enforcement.