PATTERN Cited by 2 sources
Adversarial review sub-agent¶
Intent¶
Spin up an independent sub-agent with an adversarial reviewer prompt to critique a PR before any human looks at it. The sub-agent has no context from the main (building) agent's conversation, so its review is not biased by the justifications the main agent has accumulated for its choices.
Canonical articulation — Atlassian Fireworks, 2026-04-24:
"For review, have an adversarial persona subagent that spins up and reviews what the main agent has written. I have this one tied to a
!review-prprompt shortcut that spins it up as an independent subagent." (Source: sources/2026-04-24-atlassian-rovo-dev-driven-development)
Shape¶
Main agent writes PR
│
│ (diff ready)
▼
!review-pr ──► [Independent sub-agent]
│ adversarial prompt
│ no prior context
▼
[Review comments]
│
▼
Main agent addresses comments
│
▼
CI pipeline (lint / vet / tests / Helm)
│
▼
Human review (architecture-level)
Prompt shape¶
The reviewer sub-agent's prompt must invert LLM default bias toward agreement:
- "Find what is wrong with this PR; red-team it."
- "Assume the code has subtle bugs and look for them."
- "Challenge the design decisions; don't accept the default justifications."
Contrast with the default "please review this PR" prompt, which biases toward validation — the LLM tends to confirm what is there rather than look for flaws.
Why independence matters¶
The main agent has a long conversation history of justifications for each design decision ("I chose X because Y, so I did Z"). If the reviewer shares that history, it has already been persuaded by those justifications — it will validate rather than challenge. An independent sub-agent, evaluating the diff on its own merits without the persuasion history, is harder to co-opt.
See patterns/context-segregated-sub-agents for the general case of this pattern.
Where it fits in the review stack¶
Three-tier review, bottom to top:
| Tier | Reviewer | Focus | Latency |
|---|---|---|---|
| 1 | Adversarial sub-agent | Bugs / design flaws the main agent missed | Seconds |
| 2 | CI quality gate | Lint / vet / tests / Helm validation | Minutes |
| 3 | Human | Architecture / design intent / risk | Minutes–hours |
The adversarial sub-agent is the pre-human correctness tier — it catches what the main agent missed so the human's time is spent on the architectural axis, not on finding bugs. See patterns/pre-human-agent-review.
For bigger / scarier PRs¶
"For bigger, scarier PRs: spin up an independent agent to review before a human even looks at it." — the pattern scales down as well as up. For trivial PRs, the sub-agent adds friction; for risky PRs, it is the designed-in safety net.
Trigger ergonomics¶
The Fireworks team's implementation: a !review-pr prompt
shortcut. The developer types two words; the sub-agent fires;
review comments come back. Low-friction invocation matters — if
the pattern requires 10 clicks to invoke, it gets skipped on the
PRs where it matters most (the ones the developer is in a hurry
to merge).
Guardrails¶
- Reviewer output is a first-class input. Don't spin up the sub-agent and ignore its output; the ritual-compliance failure mode defeats the pattern.
- Reviewer prompt is versioned. Treat the adversarial prompt as production code — it evolves, it should be in source control, reviewable, and testable.
- Reviewer finds are not the final oracle. Sometimes the reviewer is wrong. The main agent (or the human) should be able to argue back and dismiss a spurious find — but with justification recorded.
Failure modes¶
- Review theatre. Sub-agent fires, findings are always dismissed. Mitigation: require review findings to be addressed or explicitly triaged before CI passes.
- Adversarial echo chamber. Main + reviewer are the same model; both share the same blind spots. Mitigation: use a different model family for the reviewer if cost permits.
- Over-reporting. Adversarial prompt is too aggressive; reviewer finds flaws everywhere, signal-to-noise collapses. Mitigation: calibrate the prompt; treat the reviewer itself as a component whose prompt needs tuning.
Seen in¶
- sources/2026-04-24-atlassian-rovo-dev-driven-development —
canonical instance;
!review-prprompt shortcut spins up an adversarial sub-agent for every PR in the Fireworks codebase before human review. - sources/2026-05-18-cloudflare-project-glasswing-what-mythos-showed-us — vulnerability-research instance. The Validate stage of Cloudflare's vulnerability discovery harness places an independent adversarial agent between Hunt-stage findings and the queue. Cloudflare's verbatim formulation tightens the pattern with a third independence axis that the Atlassian formulation doesn't name: "Adding a second agent between the initial finding and the queue — one with a different prompt, a different model, and no ability to generate its own findings — catches a lot of the noise the first agent would miss if it just checked its own work. It turns out that putting two agents in deliberate disagreement is way more effective than just telling one agent to be careful." The no-emit constraint is load-bearing — without it, the validator inflates queue size instead of refining it. Adds a vuln-research-specific cost-control rationale on top of the Atlassian pre-human-review framing.
Related¶
- concepts/adversarial-review-persona
- concepts/agentic-development-loop
- concepts/signal-to-noise-in-ai-vulnerability-triage
- patterns/pre-human-agent-review
- patterns/specialized-reviewer-agents
- patterns/specialized-agent-decomposition
- patterns/context-segregated-sub-agents
- patterns/multi-stage-vulnerability-discovery-harness
- patterns/split-bug-and-reachability-questions
- systems/rovo-dev
- systems/cloudflare-vulnerability-discovery-harness