CONCEPT Cited by 2 sources
Weakly-adversarial critic¶
Definition¶
A weakly-adversarial critic is an agent positioned to audit peer agents' work for hallucinations, analytical gaps, and interpretation variability — pointed at them, not cooperating with them — but not fully adversarial: it shares the task's goal and is rewarded for improving the overall outcome, not for winning against its peers.
The stance is architecturally distinct from three adjacent roles:
| Stance | Who wins | Shared goal? |
|---|---|---|
| Cooperative helper | Everyone together | Yes — critic helps peer succeed |
| Weakly adversarial critic | The task, via the critic catching errors | Yes — critic and peer share outcome, but critic scores peer's work |
| Fully adversarial (red team) | The critic if it finds a flaw | No — critic may be rewarded for finding flaws regardless of task outcome |
The "weakly" qualifier matters: without it, the critic's incentive drifts toward finding-flaws-at-all-costs (paranoia, false-positive-heavy output); without the adversarial element, the critic slides into "looks good to me" collaborative approval. See concepts/self-approval-bias for the failure mode the stance is designed to prevent.
Named and canonicalised in Slack's Streamlining security investigations with agents post (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents).
Slack's verbatim framing¶
"The weakly adversarial relationship between the Critic and the expert group helps to mitigate against hallucinations and variability in the interpretation of evidence."
(Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)
The Critic's job is to "assess and quantify the quality of findings made by domain experts using a rubric we've defined" and to "annotate the experts' findings with its own analysis and a credibility score for each finding." Slack calls the Critic a "meta-expert".
Why weakly and not fully adversarial¶
Too cooperative fails: self-approval bias¶
A single-model pipeline that generates and evaluates its own output hits concepts/self-approval-bias — the model is disproportionately likely to approve its own generations. Even with a separate cooperative-stance critic, if the critic is prompted to be "helpful" to the peer, it inherits a confirmation-oriented tone; hallucinations slip through.
Too adversarial fails: false-positive paranoia¶
A fully adversarial (red-team) critic is rewarded for finding problems. Applied to a production investigation stream, this produces:
- High false-positive rate — "might be X" findings escalated as if confirmed.
- Analysis paralysis — the investigation can't progress because the critic always finds more doubts.
- Reader fatigue — the Director (or a human) receiving the critic's output stops taking it seriously.
Weakly adversarial is the pragmatic middle¶
The critic shares the investigation's goal of "reach a defensible conclusion" (so it has skin in the outcome) but is oriented toward auditing the method, not the goal. It asks "is this finding well-supported by the evidence?", not "is the peer my ally?" or "can I find any reason to reject this?"
Canonical emergent behaviour¶
Slack discloses the canonical payoff in an edited worked example: an Expert reviewed a process-ancestry chain and "incorrectly assessed credential handling as secure". The Critic noticed a credential exposure in the ancestry that the Expert missed, flagged it with its own analysis, and the Director pivoted the investigation to focus on this issue.
Verbatim: "What is notable about this result is that the expert did not raise the credential exposure in its findings; the Critic noticed it as part of its meta-analysis of the expert's work." (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)
This is the "Critic catches what Expert missed" payoff pattern — the operational evidence that the weakly-adversarial stance produces real value, not just ceremony.
Design mechanics¶
To implement a weakly-adversarial critic, three mechanisms tend to co-occur:
- Separate model invocation. The critic is a separate invocation, not the same model asked to self-evaluate. See patterns/one-model-invocation-per-task.
- Rubric-driven output. The critic scores against an explicit rubric (dimensions, scale, pass/fail criteria) rather than free-form "looks good". Rubric shape forces structured engagement.
- Credibility scores, not binary pass/fail. The critic emits a credibility score per finding (not a pass/fail verdict). This lets downstream consumers (the Director) reason probabilistically rather than binary-gating.
Contrasts¶
- vs. LLM-as-judge (pass/fail) — concepts/llm-as-judge is often a binary pass/fail evaluator. A weakly-adversarial critic is a scorer + annotator, not a gate.
- vs. multi-round critic-fixer loops — patterns/multi-round-critic-quality-gate runs critics in rounds with fixer agents between them; the critic is still weakly-adversarial in stance but operates in a write-review-revise loop. The Slack Spear variant runs the critic once per round, and the Director decides what to do with the critique (pivot, continue, conclude) — orchestration logic is apex, not middle-tier.
- vs. drafter-evaluator refinement — patterns/drafter-evaluator-refinement-loop pairs a drafter with an evaluator and retries on failure. The Slack variant is higher-level: the evaluator (Critic) doesn't gate the output; it augments it with metadata that a third agent (Director) interprets.
- vs. specialised sub-reviewer agents — patterns/specialized-reviewer-agents decomposes the review surface by domain (security, perf, code quality); weakly-adversarial stance is orthogonal and applies within each reviewer.
Where the weakness matters¶
Pushing the critic toward fully adversarial (stronger prompts for skepticism, bias toward flagging) is a tuning knob. Production systems will land somewhere on the cooperative-to-adversarial spectrum; "weakly" is Slack's point-on-the-spectrum, not an inherent property. Teams calibrating their own critic need to check:
- False-positive rate — is the critic over-reporting non-issues?
- False-negative rate — is the critic approving findings it should have challenged?
- Timeliness — is critic latency acceptable for the loop's cadence?
- Tier compatibility — does the critic tier in the knowledge pyramid have enough capacity to catch the Experts' mistakes? Pushing the Critic to too-cheap a tier breaks the stance's premise.
Caveats¶
- Stance is asserted, not measured. Slack's claim that the Critic/Expert dynamic is "weakly adversarial" is not validated with disclosed metrics (hallucination-rate reduction, false-positive rate, etc.).
- Rubric is opaque. The specific rubric Slack's Critic uses is not disclosed.
- Model-family choice matters. The post doesn't disclose whether the Expert and Critic run different model families; cross-family criticism tends to catch more (parallels patterns/multi-round-critic-quality-gate's caveat on critic model-family choice).
- Weakness is a dial, not a discrete state. The stance sits on a continuum; teams will tune toward one end or the other based on their false-positive/false-negative tolerance.
Empirical support (2026-04-13 update)¶
Slack's second post in the Spear series disclosed a 170,000 finding distribution across the Critic's 5-level credibility rubric (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications):
| Score band | Label | % |
|---|---|---|
| 0.9-1.0 | Trustworthy | 37.7% |
| 0.7-0.89 | Highly-plausible | 25.4% |
| 0.5-0.69 | Plausible | 11.1% |
| 0.3-0.49 | Speculative | 10.4% |
| 0.0-0.29 | Misguided | 15.4% |
Sub-plausibility rate: 25.8% (Speculative + Misguided combined). This is canonical empirical support for the weakly-adversarial stance's value — without the Critic, roughly one finding in four would reach the Director as equally authoritative as trustworthy findings. An over-cooperative critic wouldn't produce a sub-plausibility fraction this large, which is how we know the stance is actually adversarial-enough. See concepts/credibility-scoring-rubric for the full rubric + distribution discussion.
Seen in¶
- systems/slack-spear — canonical first wiki instance.
Critic is a "meta-expert" positioned to audit four domain
Experts; scores findings against a rubric; catches
credential-exposure finding Expert missed. (Source:
sources/2025-12-01-slack-streamlining-security-investigations-with-agents)
Second post adds (Source:
sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications):
5-level rubric (Trustworthy / Highly-plausible / Plausible /
Speculative / Misguided) + disclosed distribution over
170,000 findings (37.7% / 25.4% / 11.1% / 10.4% / 15.4%,
sub-plausibility = 25.8%); four-tool introspection suite
(
get_tool_call,get_tool_result,get_toolset_info,list_toolsets) giving the adversarial stance teeth at methodology audit.
Related¶
- patterns/director-expert-critic-investigation-loop
- patterns/drafter-evaluator-refinement-loop
- patterns/multi-round-critic-quality-gate
- patterns/specialized-reviewer-agents
- patterns/three-channel-context-architecture
- patterns/critic-tool-call-introspection-suite
- patterns/timeline-assembly-from-scored-findings
- concepts/llm-as-judge
- concepts/self-approval-bias
- concepts/llm-hallucination
- concepts/knowledge-pyramid-model-tiering
- concepts/credibility-scoring-rubric
- concepts/narrative-coherence-as-hallucination-filter