PATTERN Cited by 1 source
Multi-round critic quality gate¶
Intent¶
Gate AI-generated artifacts (docs, code, context files, release copy) behind multiple rounds of independent critic-agent review with fixer agents applying corrections between rounds, measuring scoring deltas round-over-round. Raise quality without human review on every artifact; enforce invariants (zero hallucinations, 100% path validity) as hard gates at the final round.
Mechanism¶
Stage 1: draft¶
A writer agent (or writer ensemble) produces the initial artifact.
Stage 2: round-robin critic + fixer¶
Three rounds, typically:
- Round 1 — N critic agents score the draft independently + flag weaknesses. Scores are aggregated. Fixer agents read all critique and apply corrections.
- Round 2 — different critic agents (or re-prompted ones) score the revision. Fixers apply another round of corrections.
- Round 3 — final critics, often focused on integration tests + invariant verification (e.g. "does every file path in this artifact exist?").
Meta's configuration (Source: sources/2026-04-06-meta-how-meta-used-ai-to-map-tribal-knowledge-in-large-scale-data-pipelines): 10+ critic passes across 3 rounds + 4 fixer agents.
Stage 3: hard-invariant gate¶
Invariants checked before release:
- Zero hallucinated file paths (Meta's explicit invariant)
- 100% pass rate on a test-prompt corpus (Meta: 55+ prompts × 5 personas = 275+ test cases, 100% pass)
Artifacts that fail invariants re-enter the critic-fixer loop, do not ship.
Measuring improvement¶
The pattern's value is demonstrable only by round-over-round score improvement:
| Round | Meta's critic score |
|---|---|
| Draft | baseline |
| Round 1 critic | 3.65 / 5.0 |
| Round 2 critic | (intermediate) |
| Round 3 critic | 4.20 / 5.0 |
A diminishing-returns curve is expected. If round 3 is not materially better than round 2, you may be past the point of useful criticism.
Why it works¶
- Independent critics reduce correlated blind spots — one critic's misses are caught by the next. Parallels the rationale behind patterns/specialized-reviewer-agents for code review.
- Fixer-critic separation — fixers optimise for the critic's feedback without the critic's reasoning path baked in; avoids the "agent agrees with itself" failure mode of self-criticism.
- Rounds bound the work — finite budget prevents unbounded iteration on stubborn cases. Human review handles edges.
- Hard invariants as gates, not aspirations — zero-hallucinations is enforced mechanically, not trusted to critic judgement.
Tradeoffs¶
- Compute cost — each round is N critic calls + fixer calls. At Meta's scale (59 artifacts × 3 rounds × 10+ critics), this is substantial. Reserve for artifacts consumed many times after production (the payback requires amortisation).
- Critic model quality ceiling — if the critic is the same family as the writer, the same weaknesses persist. Meta does not disclose whether critic models differ from writers; in practice, cross- family criticism (different vendor / different fine-tune) catches more.
- Scoring calibration drift — rubrics must be stable; if later- round critics score on different dimensions, round-over-round deltas are not comparable.
- No substitute for domain review — Meta pairs automated critics with "three prompt testers validated 55+ queries across five personas" — human or human-proxy test-suite validation. Critics alone don't replace acceptance tests.
Distinct from runtime LLM-as-judge¶
concepts/llm-as-judge is typically applied at runtime on per-request output (e.g. chat responses, rewrites). Multi-round critic quality gate is applied before release on durable artifacts (context files, docs, migration guides, release notes).
| Axis | Runtime LLM-as-judge | Multi-round critic gate (this) |
|---|---|---|
| Timing | Per-request | Pre-release |
| Latency budget | Tight (user-facing) | Loose (batch) |
| Iteration depth | 1-2 rounds typical | 3+ rounds |
| Fixer present? | Rarely | Load-bearing |
| Scope | Individual output | Durable artifact |
Siblings and lineage¶
- patterns/drafter-evaluator-refinement-loop — the single-round analogue; this pattern is the multi-round generalisation.
- patterns/vlm-evaluator-quality-gate — the vision-LLM analogue on Instacart's Pixel.
- patterns/specialized-reviewer-agents — the code-review analogue at Cloudflare's AI Code Review.
- patterns/precomputed-agent-context-files — the parent pattern multi-round critic is a gate inside.
- concepts/ai-agent-guardrails — the family this pattern belongs to.
Seen in¶
- Meta AI Pre-Compute Engine (2026-04-06) — canonical wiki instance. 10+ critic passes × 3 rounds + 4 fixers. Quality score 3.65 → 4.20 / 5.0. Zero hallucinated file paths enforced as a hard gate. 55+ prompts × 5 personas validated at 100% pass by separate prompt-tester agents. (Source: sources/2026-04-06-meta-how-meta-used-ai-to-map-tribal-knowledge-in-large-scale-data-pipelines.)
Related¶
- concepts/llm-as-judge — the runtime sibling
- concepts/ai-agent-guardrails — the family
- patterns/drafter-evaluator-refinement-loop — single-round sibling
- patterns/vlm-evaluator-quality-gate — vision-LLM sibling
- patterns/specialized-reviewer-agents — code-review sibling
- patterns/precomputed-agent-context-files — parent pattern
- systems/meta-ai-precompute-engine — the canonical producer