Skip to content

PATTERN Cited by 1 source

Adversarial review sub-agent

Intent

Spin up an independent sub-agent with an adversarial reviewer prompt to critique a PR before any human looks at it. The sub-agent has no context from the main (building) agent's conversation, so its review is not biased by the justifications the main agent has accumulated for its choices.

Canonical articulation — Atlassian Fireworks, 2026-04-24:

"For review, have an adversarial persona subagent that spins up and reviews what the main agent has written. I have this one tied to a !review-pr prompt shortcut that spins it up as an independent subagent." (Source: sources/2026-04-24-atlassian-rovo-dev-driven-development)

Shape

    Main agent writes PR
              │ (diff ready)
       !review-pr ──► [Independent sub-agent]
                          │ adversarial prompt
                          │ no prior context
                    [Review comments]
       Main agent addresses comments
           CI pipeline (lint / vet / tests / Helm)
                 Human review (architecture-level)

Prompt shape

The reviewer sub-agent's prompt must invert LLM default bias toward agreement:

  • "Find what is wrong with this PR; red-team it."
  • "Assume the code has subtle bugs and look for them."
  • "Challenge the design decisions; don't accept the default justifications."

Contrast with the default "please review this PR" prompt, which biases toward validation — the LLM tends to confirm what is there rather than look for flaws.

Why independence matters

The main agent has a long conversation history of justifications for each design decision ("I chose X because Y, so I did Z"). If the reviewer shares that history, it has already been persuaded by those justifications — it will validate rather than challenge. An independent sub-agent, evaluating the diff on its own merits without the persuasion history, is harder to co-opt.

See patterns/context-segregated-sub-agents for the general case of this pattern.

Where it fits in the review stack

Three-tier review, bottom to top:

Tier Reviewer Focus Latency
1 Adversarial sub-agent Bugs / design flaws the main agent missed Seconds
2 CI quality gate Lint / vet / tests / Helm validation Minutes
3 Human Architecture / design intent / risk Minutes–hours

The adversarial sub-agent is the pre-human correctness tier — it catches what the main agent missed so the human's time is spent on the architectural axis, not on finding bugs. See patterns/pre-human-agent-review.

For bigger / scarier PRs

"For bigger, scarier PRs: spin up an independent agent to review before a human even looks at it." — the pattern scales down as well as up. For trivial PRs, the sub-agent adds friction; for risky PRs, it is the designed-in safety net.

Trigger ergonomics

The Fireworks team's implementation: a !review-pr prompt shortcut. The developer types two words; the sub-agent fires; review comments come back. Low-friction invocation matters — if the pattern requires 10 clicks to invoke, it gets skipped on the PRs where it matters most (the ones the developer is in a hurry to merge).

Guardrails

  1. Reviewer output is a first-class input. Don't spin up the sub-agent and ignore its output; the ritual-compliance failure mode defeats the pattern.
  2. Reviewer prompt is versioned. Treat the adversarial prompt as production code — it evolves, it should be in source control, reviewable, and testable.
  3. Reviewer finds are not the final oracle. Sometimes the reviewer is wrong. The main agent (or the human) should be able to argue back and dismiss a spurious find — but with justification recorded.

Failure modes

  • Review theatre. Sub-agent fires, findings are always dismissed. Mitigation: require review findings to be addressed or explicitly triaged before CI passes.
  • Adversarial echo chamber. Main + reviewer are the same model; both share the same blind spots. Mitigation: use a different model family for the reviewer if cost permits.
  • Over-reporting. Adversarial prompt is too aggressive; reviewer finds flaws everywhere, signal-to-noise collapses. Mitigation: calibrate the prompt; treat the reviewer itself as a component whose prompt needs tuning.

Seen in

Last updated · 510 distilled / 1,221 read