CONCEPT Cited by 1 source

"What NOT to flag" prompt¶

"What NOT to flag" is the prompt-engineering discipline of spending more explicit instruction on what an LLM should skip than on what it should do. The observation is that the default failure mode of an unconstrained reviewer-style LLM is a flood of low-signal suggestions, not a shortage of critical findings.

Why it's load-bearing¶

Cloudflare's framing in the 2026-04-20 AI Code Review post is blunt:

"It turns out that telling an LLM what not to do is where the actual prompt engineering value resides. Without these boundaries, you get a firehose of speculative theoretical warnings that developers will immediately learn to ignore."

The implicit model: an LLM given a broad instruction ("find security bugs") will enumerate every possible class of issue at every possible severity — including theoretical, defensive, and stylistic ones — because the cost of omitting a real one is higher (in its training-signal) than the cost of adding a speculative one.

Result: developers learn to ignore the entire stream. The signal goes to zero not because the model can't find real bugs, but because the real ones are drowned.

The security-reviewer example¶

Cloudflare's security reviewer carries explicit positive and negative lists:

## What to Flag
- Injection vulnerabilities (SQL, XSS, command, path traversal)
- Authentication/authorisation bypasses in changed code
- Hardcoded secrets, credentials, or API keys
- Insecure cryptographic usage
- Missing input validation on untrusted data at trust boundaries

## What NOT to Flag
- Theoretical risks that require unlikely preconditions
- Defense-in-depth suggestions when primary defenses are adequate
- Issues in unchanged code that this MR doesn't affect
- "Consider using library X" style suggestions

The negative list is what changes the output distribution. Each bullet is a category the model would otherwise emit findings in; each one is explicitly retired.

The signal metric¶

In Cloudflare's first 30 days: 1.2 findings per review on average. This is deliberately low:

"The code quality reviewer is the most prolific, producing nearly half of all findings. [...] That is about 1.2 findings per review on average, which is deliberately low. We biased hard for signal over noise, and the 'What NOT to Flag' prompt sections are a big part of why the numbers look like this rather than 10+ findings per review of dubious quality."

A review that finds on average 1.2 issues is useful; one that finds 15 items of which 2 matter is trained out of the user's attention in a week.

Generalisation beyond code review¶

The pattern applies anywhere an LLM emits findings, suggestions, or critiques to a downstream human:

Linting / static analysis annotation systems.
Auto-generated alerts / monitoring narrations.
Agent-generated design-review comments.
AI pair-programmer inline suggestions.
LLM-as-judge scoring rubrics ("mark correct if X; do not mark incorrect for Y").

Rule: if the downstream consumer is a human scanning for signal, "what to ignore" is prompt real estate, not filler.

The failure modes of omitting it¶

Firehose output. Every possible issue at every severity — users stop reading.
False positives on correct code. "Consider adding error handling" on a function that already has it.
Theoretical-risk warnings. "A race could occur if X and Y and Z simultaneously" when those preconditions don't hold in the codebase.
Unchanged-code flagging. The model finds issues in neighbouring code the MR didn't touch.
Style-over-substance suggestions. "Consider using library X" framed as a substantive review comment.

Seen in¶

sources/2026-04-20-cloudflare-orchestrating-ai-code-review-at-scale — named explicitly as the primary prompt-engineering lever. Every specialised reviewer has a positive-plus-negative list.

systems/cloudflare-ai-code-review — canonical production consumer.
concepts/context-engineering — the broader discipline of shaping LLM inputs for desired output distributions.
concepts/llm-as-judge — the sibling prompt-engineering surface where "do not mark incorrect for Y" is equally load-bearing.
patterns/specialized-reviewer-agents — the architectural shape that makes narrow "what NOT to flag" lists tractable (one per domain).