CONCEPT Cited by 2 sources

Adversarial review persona¶

Definition¶

A sub-agent spun up with a prompt / persona that biases it toward critiquing, challenging, and finding flaws in the main agent's output — before any human reviews the work. Independent from the main agent's context, so it is not biased by the justifications the main agent has accumulated for its choices.

Atlassian's Rovo Dev / Fireworks post is the canonical wiki articulation:

"For review, have an adversarial persona subagent that spins up and reviews what the main agent has written. I have this one tied to a !review-pr prompt shortcut that spins it up as an independent subagent." (Source: sources/2026-04-24-atlassian-rovo-dev-driven-development)

Why "adversarial" and not "helpful"¶

Default LLM training biases models toward agreement and assistance. When the main agent has generated a PR, the reviewer agent — if prompted as "please review this PR" — tends to find things acceptable. The failure mode: reviewer validates what shipping agent already did, generating false confidence.

The adversarial persona fix: prompt the reviewer as "find what is wrong with this PR; red-team it; assume it has subtle bugs." This shifts the reviewer's default from agreement to challenge. Similar in spirit to "what NOT to flag" discipline in patterns/specialized-reviewer-agents, but inverted — pushing the reviewer to look for problems harder, not to calibrate against over-reporting.

Why a separate sub-agent¶

Two properties the separate sub-agent provides:

Context independence. The main agent has a long conversation history of justifications for its design choices ("I chose X because Y, so I did Z"). A sub-agent with no access to that history will evaluate the diff on its own merits, not on the main agent's persuasion.
Prompt isolation. The main agent has a builder prompt; the reviewer agent has a challenger prompt. These prompts would pull in opposite directions if mixed into one conversation; splitting them into separate sub-agents avoids the conflict.

See patterns/context-segregated-sub-agents for the general case of this pattern.

Relation to pre-human review¶

The adversarial persona is the pre-human correctness tier: by the time a human reviews the PR, the adversarial sub-agent has already fired and the main agent has already addressed obvious issues. Human review shifts to "architecture, design intent, risk" rather than nitpicks. See patterns/pre-human-agent-review for the pattern that composes this concept into a full workflow.

Relation to specialised reviewers¶

The Cloudflare-ecosystem pattern patterns/specialized-reviewer-agents splits review along domain axes (security, performance, docs, etc.), each with its own narrow prompt. The adversarial persona splits review along the stance axis (builder vs. challenger), with the challenger getting its own sub-agent regardless of domain.

These compose: a mature review stack can have domain-specialised adversarial sub-agents (an adversarial security reviewer, an adversarial performance reviewer), each prompted to challenge within its own domain.

Failure modes¶

Adversarial false positives. A reviewer prompted to find flaws will find flaws — sometimes real, sometimes over- stretched. Calibration of the prompt matters; "adversarial" is not the same as "contrarian for its own sake." The reviewer's output still needs to be consumed with judgement.
Adversarial echo chamber. If both the main agent and the reviewer agent are the same underlying model, they share the same blind spots. A mixed-model review setup (a different model family for the reviewer) reduces this risk but trades complexity and cost.
Review-theatre. Spinning up the sub-agent and ignoring its output — i.e., ritual compliance. Mitigated by making the adversarial sub-agent's findings a CI requirement, not a discretionary step.

Seen in¶

sources/2026-04-24-atlassian-rovo-dev-driven-development — canonical articulation. Fireworks team wires !review-pr to an independent adversarial sub-agent that reviews PRs before human review.
sources/2026-05-18-cloudflare-project-glasswing-what-mythos-showed-us — vulnerability-research instance with a sharper constraint. Cloudflare's Validate stage makes the stance-axis independence explicit ("different prompt", "different model") and adds a third axis the Atlassian formulation doesn't name: "no ability to generate its own findings". The validator can only refute, not contribute. This no-emission constraint is what prevents the validator from amplifying the upstream find-something bias — the operational lever that makes the persona effective at queue-noise reduction in vulnerability research, on top of the pre-human-review framing inherited from the Atlassian instance.