CONCEPT Cited by 2 sources

Weakly-adversarial critic¶

Definition¶

A weakly-adversarial critic is an agent positioned to audit peer agents' work for hallucinations, analytical gaps, and interpretation variability — pointed at them, not cooperating with them — but not fully adversarial: it shares the task's goal and is rewarded for improving the overall outcome, not for winning against its peers.

The stance is architecturally distinct from three adjacent roles:

Stance	Who wins	Shared goal?
Cooperative helper	Everyone together	Yes — critic helps peer succeed
Weakly adversarial critic	The task, via the critic catching errors	Yes — critic and peer share outcome, but critic scores peer's work
Fully adversarial (red team)	The critic if it finds a flaw	No — critic may be rewarded for finding flaws regardless of task outcome

The "weakly" qualifier matters: without it, the critic's incentive drifts toward finding-flaws-at-all-costs (paranoia, false-positive-heavy output); without the adversarial element, the critic slides into "looks good to me" collaborative approval. See concepts/self-approval-bias for the failure mode the stance is designed to prevent.

Named and canonicalised in Slack's Streamlining security investigations with agents post (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents).

Slack's verbatim framing¶

"The weakly adversarial relationship between the Critic and the expert group helps to mitigate against hallucinations and variability in the interpretation of evidence."

(Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)

The Critic's job is to "assess and quantify the quality of findings made by domain experts using a rubric we've defined" and to "annotate the experts' findings with its own analysis and a credibility score for each finding." Slack calls the Critic a "meta-expert".

Why weakly and not fully adversarial¶

Too cooperative fails: self-approval bias¶

A single-model pipeline that generates and evaluates its own output hits concepts/self-approval-bias — the model is disproportionately likely to approve its own generations. Even with a separate cooperative-stance critic, if the critic is prompted to be "helpful" to the peer, it inherits a confirmation-oriented tone; hallucinations slip through.

Too adversarial fails: false-positive paranoia¶

A fully adversarial (red-team) critic is rewarded for finding problems. Applied to a production investigation stream, this produces:

High false-positive rate — "might be X" findings escalated as if confirmed.
Analysis paralysis — the investigation can't progress because the critic always finds more doubts.
Reader fatigue — the Director (or a human) receiving the critic's output stops taking it seriously.

Weakly adversarial is the pragmatic middle¶

The critic shares the investigation's goal of "reach a defensible conclusion" (so it has skin in the outcome) but is oriented toward auditing the method, not the goal. It asks "is this finding well-supported by the evidence?", not "is the peer my ally?" or "can I find any reason to reject this?"

Canonical emergent behaviour¶

Slack discloses the canonical payoff in an edited worked example: an Expert reviewed a process-ancestry chain and "incorrectly assessed credential handling as secure". The Critic noticed a credential exposure in the ancestry that the Expert missed, flagged it with its own analysis, and the Director pivoted the investigation to focus on this issue.

Verbatim: "What is notable about this result is that the expert did not raise the credential exposure in its findings; the Critic noticed it as part of its meta-analysis of the expert's work." (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)

This is the "Critic catches what Expert missed" payoff pattern — the operational evidence that the weakly-adversarial stance produces real value, not just ceremony.

Design mechanics¶

To implement a weakly-adversarial critic, three mechanisms tend to co-occur:

Separate model invocation. The critic is a separate invocation, not the same model asked to self-evaluate. See patterns/one-model-invocation-per-task.
Rubric-driven output. The critic scores against an explicit rubric (dimensions, scale, pass/fail criteria) rather than free-form "looks good". Rubric shape forces structured engagement.
Credibility scores, not binary pass/fail. The critic emits a credibility score per finding (not a pass/fail verdict). This lets downstream consumers (the Director) reason probabilistically rather than binary-gating.

Contrasts¶

vs. LLM-as-judge (pass/fail) — concepts/llm-as-judge is often a binary pass/fail evaluator. A weakly-adversarial critic is a scorer + annotator, not a gate.
vs. multi-round critic-fixer loops — patterns/multi-round-critic-quality-gate runs critics in rounds with fixer agents between them; the critic is still weakly-adversarial in stance but operates in a write-review-revise loop. The Slack Spear variant runs the critic once per round, and the Director decides what to do with the critique (pivot, continue, conclude) — orchestration logic is apex, not middle-tier.
vs. drafter-evaluator refinement — patterns/drafter-evaluator-refinement-loop pairs a drafter with an evaluator and retries on failure. The Slack variant is higher-level: the evaluator (Critic) doesn't gate the output; it augments it with metadata that a third agent (Director) interprets.
vs. specialised sub-reviewer agents — patterns/specialized-reviewer-agents decomposes the review surface by domain (security, perf, code quality); weakly-adversarial stance is orthogonal and applies within each reviewer.

Where the weakness matters¶

Pushing the critic toward fully adversarial (stronger prompts for skepticism, bias toward flagging) is a tuning knob. Production systems will land somewhere on the cooperative-to-adversarial spectrum; "weakly" is Slack's point-on-the-spectrum, not an inherent property. Teams calibrating their own critic need to check:

False-positive rate — is the critic over-reporting non-issues?
False-negative rate — is the critic approving findings it should have challenged?
Timeliness — is critic latency acceptable for the loop's cadence?
Tier compatibility — does the critic tier in the knowledge pyramid have enough capacity to catch the Experts' mistakes? Pushing the Critic to too-cheap a tier breaks the stance's premise.

Caveats¶

Stance is asserted, not measured. Slack's claim that the Critic/Expert dynamic is "weakly adversarial" is not validated with disclosed metrics (hallucination-rate reduction, false-positive rate, etc.).
Rubric is opaque. The specific rubric Slack's Critic uses is not disclosed.
Model-family choice matters. The post doesn't disclose whether the Expert and Critic run different model families; cross-family criticism tends to catch more (parallels patterns/multi-round-critic-quality-gate's caveat on critic model-family choice).
Weakness is a dial, not a discrete state. The stance sits on a continuum; teams will tune toward one end or the other based on their false-positive/false-negative tolerance.

Empirical support (2026-04-13 update)¶

Slack's second post in the Spear series disclosed a 170,000 finding distribution across the Critic's 5-level credibility rubric (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications):

Score band	Label	%
0.9-1.0	Trustworthy	37.7%
0.7-0.89	Highly-plausible	25.4%
0.5-0.69	Plausible	11.1%
0.3-0.49	Speculative	10.4%
0.0-0.29	Misguided	15.4%

Sub-plausibility rate: 25.8% (Speculative + Misguided combined). This is canonical empirical support for the weakly-adversarial stance's value — without the Critic, roughly one finding in four would reach the Director as equally authoritative as trustworthy findings. An over-cooperative critic wouldn't produce a sub-plausibility fraction this large, which is how we know the stance is actually adversarial-enough. See concepts/credibility-scoring-rubric for the full rubric + distribution discussion.

Seen in¶

systems/slack-spear — canonical first wiki instance. Critic is a "meta-expert" positioned to audit four domain Experts; scores findings against a rubric; catches credential-exposure finding Expert missed. (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents) Second post adds (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications): 5-level rubric (Trustworthy / Highly-plausible / Plausible / Speculative / Misguided) + disclosed distribution over 170,000 findings (37.7% / 25.4% / 11.1% / 10.4% / 15.4%, sub-plausibility = 25.8%); four-tool introspection suite (get_tool_call, get_tool_result, get_toolset_info, list_toolsets) giving the adversarial stance teeth at methodology audit.