CONCEPT Cited by 1 source
Diff noise filtering¶
Diff noise filtering is the preprocessing step that strips non-reviewable or auto-generated content from a code diff before any AI reviewer sees it. Two consequences: token cost goes down (the model isn't paying to read files nobody reviews), and signal goes up (the model isn't inventing findings over auto-generated content that would be ignored anyway).
The noise categories¶
Cloudflare's AI Code Review names three:
Lock files — dependency-manifest outputs that no human reviews line-by-line:
bun.lock, package-lock.json, yarn.lock, pnpm-lock.yaml,
Cargo.lock, go.sum, poetry.lock, Pipfile.lock, flake.lock
Minified / bundled assets — not human-readable, regenerated from source:
Generated code — detected by scanning the first few lines for markers:
The database-migration exception¶
An explicit caveat: database migration files often carry @generated markers even though they contain schema changes that absolutely need review. The filter carves out migrations so they survive the generated-file pass.
Cloudflare's framing: "We explicitly exempt database migrations from this rule, since migration tools often stamp files as generated even though they contain schema changes that absolutely need to be reviewed."
This is the canonical example of a noise-filtering rule needing a codified exception — the schema-migration exception is large enough that omitting it would silently miss one of the highest-risk change classes.
Why filtering beats "tell the model to ignore X"¶
Parallel to concepts/prompt-boundary-sanitization: filtering runs before the model sees the input, so no prompt-level instruction can be talked out of respecting it.
Contrast with "do not flag changes in lock files" as a prompt:
- Prompt-level may be bypassed by a well-crafted injection.
- Token cost still scales with the filtered-out content — you're still paying to have the model read the 10,000-line lock-file diff it's supposed to ignore.
- Cache-key explosion — every run has a different lock-file diff that will poison the prompt cache.
Filter-at-preprocessing wins on cost, reliability, and cache-hit rate.
The composite picture¶
Diff noise filtering is the first of three preprocessing stages Cloudflare's AI Code Review applies:
- Noise filtering (this concept) — drop lock files, minified assets, generated code (minus migrations).
- Risk-tier assessment — classify the filtered diff by size and path sensitivity into trivial / lite / full.
- Shared-context fan-out — write per-file patch files to a
diff_directoryand a shared MR-context file so sub-reviewers fetch only what they need.
Together these three collapse a raw MR into a cacheable, appropriately-tiered, low-duplication input.
Seen in¶
- sources/2026-04-20-cloudflare-orchestrating-ai-code-review-at-scale — canonical filter list + the migration exception.
Related¶
- systems/cloudflare-ai-code-review — the production consumer.
- concepts/risk-tier-assessment — the next preprocessing stage that consumes the filtered diff.
- concepts/what-not-to-flag-prompt — the complementary prompt-level discipline (belt-and-braces).
- patterns/specialized-reviewer-agents — benefits from lower token cost + higher signal per agent.