PATTERN Cited by 2 sources
Director / Expert / Critic investigation loop¶
Intent¶
Run a long-running investigative agent task (security alert triage, root-cause analysis, deep research, incident review) as a three-persona agent team with a round-based loop:
- Director — planner + progressor. Forms questions, decides investigation phases, produces the final report.
- N Experts — domain specialists. Each owns a distinct toolset / data-source set; produces findings in response to the Director's question.
- Critic — meta-reviewer. Audits Expert findings against a rubric, credibility-scores them, condenses into a timeline, feeds the condensed view back to the Director.
The loop shape: Director asks → Experts answer → Critic reviews → Director receives condensed timeline → Director asks again.
Canonicalised by Slack's Security Engineering team as the core architecture of Slack Spear, their security- investigation agent service (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents).
Loop diagram¶
┌───────────────────────────────────────────────┐
│ │
│ Director (plans + progresses) │
│ │ ▲ │
│ │ question │ condensed │
│ │ │ timeline │
│ ▼ │ │
│ ┌─────────────────┐ │ │
│ │ Expert A │─findings┤ │
│ │ Expert B │─findings┤ │
│ │ Expert C │─findings┤ │
│ │ Expert D │─findings┤ │
│ └─────────────────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌─────────┐ scored │ │
│ │ Critic │─findings─────┘ │
│ └─────────┘ + timeline │
│ │
└───────────────────────────────────────────────┘
Why three personas?¶
Director vs Expert: separate planning from execution¶
Single-persona agents that try to simultaneously plan-the- investigation and execute-individual-queries conflate two different cognitive loads in one prompt. Tool selection errors correlate with tool-inventory growth; planning errors correlate with context-crowding. Separating them gives each persona a small prompt and a focused task.
See patterns/specialized-agent-decomposition for the general specialisation argument.
Critic: the hallucination check¶
Without a Critic, the Director consumes Expert findings directly — including hallucinated tool calls, mis-read data, and plausible-but-wrong inferences. The Critic's job is to catch these before they shape the Director's next question.
Slack's canonical emergent-behaviour disclosure: the Expert missed a credential exposure in a process-ancestry chain; the Critic flagged it; the Director pivoted the investigation. "What is notable about this result is that the expert did not raise the credential exposure in its findings; the Critic noticed it as part of its meta-analysis of the expert's work." (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)
Why not just critic-fixer rounds?¶
Multi-round critic-fixer loops (as in patterns/multi-round-critic-quality-gate) have the Critic gate the Expert's output until it passes. The Director / Expert / Critic loop is higher-level: the Critic augments the Expert output (with credibility scores + analysis) rather than gating it, and the Director decides what to do with the augmented view — progress the investigation, pivot, or conclude. The third persona is the architectural reason this shape isn't just a drafter-evaluator retry loop.
Mechanism¶
1. Per-task model invocations (no mega-prompt)¶
Each persona's task is a separate model invocation with its own structured-output schema. Control flow lives in application code, not in prompt bullets. See patterns/one-model-invocation-per-task and concepts/prompt-is-not-control.
2. Director's journaling tool¶
Slack discloses that the Director "uses a journaling tool for planning and organizing the investigation as it progresses." The journal is a first-class artifact the Director reads + writes across rounds, carrying planning state without re-deriving it from scratch each round.
3. Critic's condensation step¶
The Critic produces two outputs per round:
- Per-finding annotations with credibility scores against a defined rubric.
- A condensed investigation timeline assembled from the highest-credibility findings, merged with the running timeline.
Only the condensed timeline flows upward to the Director. This is the top half of the knowledge pyramid — high-tier cognition operating on pre-digested input, not raw data. See concepts/knowledge-pyramid-model-tiering.
4. Phase-gated progression¶
The Director decides phase transitions explicitly (discovery → trace → conclude) via a meta-phase invocation. Phase affects which Expert(s) are queried + what model parameters are used. See patterns/phase-gated-investigation-progression.
5. Model tiering (knowledge pyramid)¶
- Experts: cheap models (tool-call-heavy, token-intensive but cognitively shallow work).
- Critic: mid-tier models (reasoning-dense condensation).
- Director: top-tier models (strategic decisions on already-condensed input).
See concepts/knowledge-pyramid-model-tiering.
When to reach for it¶
- Task has a natural plan/execute/review decomposition. Security investigations, incident root-cause, compliance audits, deep research agents all fit.
- Task requires cross-referencing multiple domains. The Expert layer is how you get "did this cross-reference between access logs and source code check out?" without cramming every data source into one prompt.
- Hallucinations are costly. The Critic's weakly- adversarial stance is the architectural defence. If your task has low hallucination tolerance (security, legal, medical), the Critic pays for itself.
- Supervision over collection is the goal. Slack's framing: "we're switching to supervising investigation teams, rather than doing the laborious work of gathering evidence." The Director/Critic structure produces an auditable narrative a human can supervise; bare tool-calling agents produce an event log a human has to reconstruct.
When not to reach for it¶
- Task is single-domain. If there's only one data source, only one toolset, the Expert layer collapses to a single agent and the specialisation payoff disappears.
- Task is short. A single-round Q+A doesn't need phase progression or a Critic; the orchestration overhead isn't earned.
- Tolerance for hallucination is high. A brainstorming assistant doesn't need a Critic; the stakes don't justify the extra model call.
- Tool inventory is small. If all tools fit comfortably in one prompt, specialisation is premature.
Composes with¶
- patterns/hub-worker-dashboard-agent-service — productises the loop as a service, with Worker running the loop, Hub storing events + providing metrics, Dashboard letting humans supervise live.
- patterns/phase-gated-investigation-progression — gates the loop's behaviour on explicit named phases.
- concepts/knowledge-pyramid-model-tiering — tiers the three personas' model cost against their task cognitive load.
- concepts/weakly-adversarial-critic — calibrates the Critic's stance between cooperative helper and red team.
- patterns/specialized-agent-decomposition — gives the peer-Expert-agent layer its theoretical justification.
Contrasts¶
- vs. patterns/coordinator-sub-reviewer-orchestration (Cloudflare AI code review) — closest architectural sibling. Cloudflare consolidates middle+apex into a single coordinator agent with a judge pass inside it; Slack separates them into Critic + Director. The Slack shape is higher-level (Director decides what to do with the Critic's output); the Cloudflare shape is simpler (coordinator is both judge and writer).
- vs. patterns/multi-round-critic-quality-gate (Meta tribal knowledge) — Meta's multi-round shape has writer-critic-fixer loops with rounds measured by scoring deltas. Slack's shape has rounds measured by investigation progression, not score delta. Meta's shape is artifact production; Slack's is investigation execution.
- vs. patterns/drafter-evaluator-refinement-loop (Lyft localization) — Lyft pairs drafter + evaluator in a single retry loop. Slack adds the Director-above-evaluator third layer that handles progress, not just retry.
- vs. patterns/planner-coder-verifier-router-loop — same three-layer shape at the code-task altitude. The Planner/Coder/Verifier mapping to Director/Expert/Critic is nearly direct, but Slack's loop operates on security data rather than code, and the Critic's scoring is credibility- weighted rather than pass/fail.
Tradeoffs¶
- Three model tiers to manage — tier changes require coordinated re-tuning across all three personas; individual model upgrades can shift the tier balance.
- Critic latency — the Critic runs between every Expert→Director hop, adding model-call latency on every round. Mitigated by running Critic on a mid-tier model.
- Rubric maintenance — the Critic's rubric becomes its own artifact to maintain, test, and calibrate.
- Director-Critic collusion risk — if Director and Critic run on the same model family, correlated blind spots persist. Cross-family is safer when feasible.
Seen in¶
- systems/slack-spear — canonical first wiki instance. 1 Director + 4 Experts (Access, Cloud, Code, Threat) + 1 Critic. Director plans + progresses phases + writes final report; Experts produce findings from their domain tool surfaces; Critic scores + condenses + detects Expert blind spots. Canonical worked example: Critic caught a credential exposure the Expert missed, Director pivoted the investigation. (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents) Second post specifies the context plumbing underneath the loop — the three channels (Director's Journal + Critic's Review + Critic's Timeline) that carry all inter-invocation state in place of raw message history. Critic's role specified as two separate tasks (Review + Timeline), with the Review using a four-tool introspection suite and the Timeline producing a narrative-coherence-scored chronology. Disclosed operational number: 170,000 reviewed findings with 25.8% sub-plausibility rate. (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications)
Related¶
- systems/slack-spear
- concepts/knowledge-pyramid-model-tiering
- concepts/weakly-adversarial-critic
- concepts/investigation-phase-progression
- concepts/prompt-is-not-control
- concepts/structured-journaling-tool
- concepts/credibility-scoring-rubric
- concepts/narrative-coherence-as-hallucination-filter
- concepts/no-message-history-carry-forward
- concepts/online-context-summarisation
- concepts/gap-identification-top-n
- patterns/one-model-invocation-per-task
- patterns/phase-gated-investigation-progression
- patterns/hub-worker-dashboard-agent-service
- patterns/three-channel-context-architecture
- patterns/critic-tool-call-introspection-suite
- patterns/timeline-assembly-from-scored-findings
- patterns/specialized-agent-decomposition
- patterns/multi-round-critic-quality-gate
- patterns/drafter-evaluator-refinement-loop
- patterns/coordinator-sub-reviewer-orchestration
- patterns/planner-coder-verifier-router-loop