CONCEPT Cited by 1 source
Bytecode vs AST static analysis¶
Definition¶
A structural choice in JVM static-analysis tool design: whether to operate on the Abstract Syntax Tree (AST) of source code (via a parser) or on the compiled bytecode (via a bytecode framework like ASM). The choice has cascading consequences for language coverage, sugar sensitivity, generated-code visibility, and what kinds of rules are expressible.
The wiki's first canonical framing is via Netflix's ArchUnit post, which articulates the three load-bearing differences.
The three differences¶
1. Language coverage¶
"Rules that need to support multiple JVM languages, such as Kotlin or Scala, often need to be rewritten for each language. … ArchUnit uses ASM to analyze actual compiled bytecode, which means it doesn't matter how that code was produced." — sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules
| Approach | Coverage |
|---|---|
| Java AST | Java only |
| Kotlin AST | Kotlin only |
| Bytecode | Java + Kotlin + Scala + Groovy + Clojure + JRuby + everything else that compiles to JVM bytecode |
For polyglot organizations, this is an N×M problem (N rules × M languages) on AST tools and an N×1 problem on bytecode tools.
2. Syntactic-sugar immunity¶
Source-level constructs that desugar to bytecode:
- Java: lambdas → invokedynamic, records → constructor +
accessors,
var→ explicit type, switch-expressions → goto. - Kotlin: extension functions → static methods, data classes
→ constructor +
equals/hashCode/toString,inlinefunctions → inlined-bytecode at call site,internal→ mangled-namepublic, default arguments → synthetic methods with$defaultsuffix. - Scala: implicits → method calls, case classes → similar to Kotlin data, traits → mixin classes.
- Lombok:
@Data/@Builder→ generated bytecode for accessors / builder. - Annotation processors / KSP / kapt: source-level annotations → generated classes / methods.
"It also allows code which should be found to be hidden under syntactic sugar not anticipated by the rule author." — sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules
Bytecode-based analysis sees the desugared form. AST-based analysis sees the source form and may miss sugar-introduced violations.
3. Class-graph retention¶
"Because ArchUnit processes the entire classpath with ASM, it retains a graph of the class data, allowing rules to easily traverse class relationships and call sites. This allows rules to have much more context about the code it is evaluating." — sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules
Bytecode tools typically build a classpath graph at analysis time — every class, method, field, and reference is in memory and queryable. AST tools typically operate per-file — the AST is the file's syntax tree; cross-file references are follow-up resolution problems.
Cross-class rules require the class graph:
- "No class in package A may call methods on classes in package B." → Need to walk call edges from A to B.
- "All implementations of interface I must reside in package P." → Need to walk inheritance edges from I to all implementers.
- "No
@Deprecatedmethod may be called from outside its declaring package." → Need call sites for the method.
These are awkward at best on per-file ASTs.
When AST analysis still wins¶
| Scenario | Why AST wins |
|---|---|
| Source formatting rules | Bytecode loses indentation, comments, blank lines |
| Comment-based rules (e.g. "every public method must have a Javadoc") | Comments are erased at compile time |
| Local-only style rules | Class graph is overhead if you only need per-line / per-statement |
| Pre-compile-time analysis (IDE inspection on uncompiled file) | No bytecode exists yet |
| Cross-language source-level migration | Need source-level transformation, not bytecode-level analysis |
| Migration tooling (OpenRewrite) | Need to emit source, not bytecode |
The Netflix post acknowledges "PMD has a Java rule API" in addition to XPath, so the dichotomy isn't absolute — PMD does have a typed-rule path. But the structural difference between bytecode analysis with class-graph retention and per-file AST analysis is a real architectural choice that determines which kinds of rules a tool can naturally express.
Concrete example: detecting all callers of a deprecated method¶
Goal: find every place LegacyThing.oldMethod() is called.
| Approach | How |
|---|---|
| Bytecode (ArchUnit) | Walk class graph; find INVOKEVIRTUAL / INVOKESTATIC instructions targeting LegacyThing.oldMethod. |
| AST (PMD) | Per-file, walk method-call expressions; check name + receiver type. Missed cases: alias imports, wildcard imports, method references, lambdas, indirect calls. |
The bytecode approach naturally finds every actual call site in the compiled output. The AST approach has to reconstruct what the compiler does to determine if a particular method-call expression resolves to the deprecated method.
Adjacent concepts¶
- concepts/architectural-fitness-function — fitness functions benefit from class-graph retention, so are typically built on bytecode analyzers.
- concepts/abstract-syntax-tree — the source-level representation AST tools operate on.
Seen in¶
- sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules — Netflix names the bytecode-vs-AST tradeoff explicitly as the structural reason for choosing ArchUnit over PMD for fleet-wide rule enforcement.