Skip to content

PATTERN Cited by 2 sources

Director / Expert / Critic investigation loop

Intent

Run a long-running investigative agent task (security alert triage, root-cause analysis, deep research, incident review) as a three-persona agent team with a round-based loop:

  • Director — planner + progressor. Forms questions, decides investigation phases, produces the final report.
  • N Experts — domain specialists. Each owns a distinct toolset / data-source set; produces findings in response to the Director's question.
  • Critic — meta-reviewer. Audits Expert findings against a rubric, credibility-scores them, condenses into a timeline, feeds the condensed view back to the Director.

The loop shape: Director asks → Experts answer → Critic reviews → Director receives condensed timeline → Director asks again.

Canonicalised by Slack's Security Engineering team as the core architecture of Slack Spear, their security- investigation agent service (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents).

Loop diagram

        ┌───────────────────────────────────────────────┐
        │                                               │
        │         Director  (plans + progresses)        │
        │              │                   ▲            │
        │              │ question          │ condensed  │
        │              │                   │ timeline   │
        │              ▼                   │            │
        │      ┌─────────────────┐         │            │
        │      │    Expert A     │─findings┤            │
        │      │    Expert B     │─findings┤            │
        │      │    Expert C     │─findings┤            │
        │      │    Expert D     │─findings┤            │
        │      └─────────────────┘         │            │
        │              │                   │            │
        │              ▼                   │            │
        │         ┌─────────┐ scored       │            │
        │         │ Critic  │─findings─────┘            │
        │         └─────────┘ + timeline                │
        │                                               │
        └───────────────────────────────────────────────┘

Why three personas?

Director vs Expert: separate planning from execution

Single-persona agents that try to simultaneously plan-the- investigation and execute-individual-queries conflate two different cognitive loads in one prompt. Tool selection errors correlate with tool-inventory growth; planning errors correlate with context-crowding. Separating them gives each persona a small prompt and a focused task.

See patterns/specialized-agent-decomposition for the general specialisation argument.

Critic: the hallucination check

Without a Critic, the Director consumes Expert findings directly — including hallucinated tool calls, mis-read data, and plausible-but-wrong inferences. The Critic's job is to catch these before they shape the Director's next question.

Slack's canonical emergent-behaviour disclosure: the Expert missed a credential exposure in a process-ancestry chain; the Critic flagged it; the Director pivoted the investigation. "What is notable about this result is that the expert did not raise the credential exposure in its findings; the Critic noticed it as part of its meta-analysis of the expert's work." (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)

Why not just critic-fixer rounds?

Multi-round critic-fixer loops (as in patterns/multi-round-critic-quality-gate) have the Critic gate the Expert's output until it passes. The Director / Expert / Critic loop is higher-level: the Critic augments the Expert output (with credibility scores + analysis) rather than gating it, and the Director decides what to do with the augmented view — progress the investigation, pivot, or conclude. The third persona is the architectural reason this shape isn't just a drafter-evaluator retry loop.

Mechanism

1. Per-task model invocations (no mega-prompt)

Each persona's task is a separate model invocation with its own structured-output schema. Control flow lives in application code, not in prompt bullets. See patterns/one-model-invocation-per-task and concepts/prompt-is-not-control.

2. Director's journaling tool

Slack discloses that the Director "uses a journaling tool for planning and organizing the investigation as it progresses." The journal is a first-class artifact the Director reads + writes across rounds, carrying planning state without re-deriving it from scratch each round.

3. Critic's condensation step

The Critic produces two outputs per round:

  • Per-finding annotations with credibility scores against a defined rubric.
  • A condensed investigation timeline assembled from the highest-credibility findings, merged with the running timeline.

Only the condensed timeline flows upward to the Director. This is the top half of the knowledge pyramid — high-tier cognition operating on pre-digested input, not raw data. See concepts/knowledge-pyramid-model-tiering.

4. Phase-gated progression

The Director decides phase transitions explicitly (discovery → trace → conclude) via a meta-phase invocation. Phase affects which Expert(s) are queried + what model parameters are used. See patterns/phase-gated-investigation-progression.

5. Model tiering (knowledge pyramid)

  • Experts: cheap models (tool-call-heavy, token-intensive but cognitively shallow work).
  • Critic: mid-tier models (reasoning-dense condensation).
  • Director: top-tier models (strategic decisions on already-condensed input).

See concepts/knowledge-pyramid-model-tiering.

When to reach for it

  • Task has a natural plan/execute/review decomposition. Security investigations, incident root-cause, compliance audits, deep research agents all fit.
  • Task requires cross-referencing multiple domains. The Expert layer is how you get "did this cross-reference between access logs and source code check out?" without cramming every data source into one prompt.
  • Hallucinations are costly. The Critic's weakly- adversarial stance is the architectural defence. If your task has low hallucination tolerance (security, legal, medical), the Critic pays for itself.
  • Supervision over collection is the goal. Slack's framing: "we're switching to supervising investigation teams, rather than doing the laborious work of gathering evidence." The Director/Critic structure produces an auditable narrative a human can supervise; bare tool-calling agents produce an event log a human has to reconstruct.

When not to reach for it

  • Task is single-domain. If there's only one data source, only one toolset, the Expert layer collapses to a single agent and the specialisation payoff disappears.
  • Task is short. A single-round Q+A doesn't need phase progression or a Critic; the orchestration overhead isn't earned.
  • Tolerance for hallucination is high. A brainstorming assistant doesn't need a Critic; the stakes don't justify the extra model call.
  • Tool inventory is small. If all tools fit comfortably in one prompt, specialisation is premature.

Composes with

Contrasts

  • vs. patterns/coordinator-sub-reviewer-orchestration (Cloudflare AI code review) — closest architectural sibling. Cloudflare consolidates middle+apex into a single coordinator agent with a judge pass inside it; Slack separates them into Critic + Director. The Slack shape is higher-level (Director decides what to do with the Critic's output); the Cloudflare shape is simpler (coordinator is both judge and writer).
  • vs. patterns/multi-round-critic-quality-gate (Meta tribal knowledge) — Meta's multi-round shape has writer-critic-fixer loops with rounds measured by scoring deltas. Slack's shape has rounds measured by investigation progression, not score delta. Meta's shape is artifact production; Slack's is investigation execution.
  • vs. patterns/drafter-evaluator-refinement-loop (Lyft localization) — Lyft pairs drafter + evaluator in a single retry loop. Slack adds the Director-above-evaluator third layer that handles progress, not just retry.
  • vs. patterns/planner-coder-verifier-router-loop — same three-layer shape at the code-task altitude. The Planner/Coder/Verifier mapping to Director/Expert/Critic is nearly direct, but Slack's loop operates on security data rather than code, and the Critic's scoring is credibility- weighted rather than pass/fail.

Tradeoffs

  • Three model tiers to manage — tier changes require coordinated re-tuning across all three personas; individual model upgrades can shift the tier balance.
  • Critic latency — the Critic runs between every Expert→Director hop, adding model-call latency on every round. Mitigated by running Critic on a mid-tier model.
  • Rubric maintenance — the Critic's rubric becomes its own artifact to maintain, test, and calibrate.
  • Director-Critic collusion risk — if Director and Critic run on the same model family, correlated blind spots persist. Cross-family is safer when feasible.

Seen in

  • systems/slack-spear — canonical first wiki instance. 1 Director + 4 Experts (Access, Cloud, Code, Threat) + 1 Critic. Director plans + progresses phases + writes final report; Experts produce findings from their domain tool surfaces; Critic scores + condenses + detects Expert blind spots. Canonical worked example: Critic caught a credential exposure the Expert missed, Director pivoted the investigation. (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents) Second post specifies the context plumbing underneath the loop — the three channels (Director's Journal + Critic's Review + Critic's Timeline) that carry all inter-invocation state in place of raw message history. Critic's role specified as two separate tasks (Review + Timeline), with the Review using a four-tool introspection suite and the Timeline producing a narrative-coherence-scored chronology. Disclosed operational number: 170,000 reviewed findings with 25.8% sub-plausibility rate. (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications)
Last updated · 470 distilled / 1,213 read