Skip to content

PATTERN Cited by 1 source

Planner / Coder / Verifier / Router loop

Intent

Build an agent that solves open-ended problems by planning, executing, judging, and refining a plan iteratively, with an explicit add-or-fix decision when the judge rejects. Four specialised LLM sub-agents plus a router, wired as a verification-gated inner loop.

Shape

   [Data File Analyzer]          ← pre-loop context extraction
   ┌───►  Planner                ← high-level plan
   │         │
   │         ▼
   │      Coder                  ← plan → executable code → run
   │         │
   │         ▼
   │      Verifier  ─── pass ──► return solution
   │         │
   │       fail
   │         ▼
   │      Router
   │       /   \
   │  "add"    "fix"
   │    │        │
   └────┴────────┘               ← revised plan; repeat
         (bounded by N rounds)

The five agents in DS-STAR's canonical instance:

  • Data File Analyzer — pre-loop; writes Python to summarise every file in the working directory. See concepts/data-file-analysis.
  • Planner — produces a step-sequenced plan.
  • Coder — turns plan into code, executes it, captures intermediate results.
  • VerifierLLM judge scoring plan sufficiency against current results.
  • Router — on Verifier reject, chooses add new step vs fix existing step; emits the refined plan.

Distinguishing primitive: the add-or-fix branch

The Router's add-vs-fix choice is what makes this pattern different from extend-only agent loops. DS-STAR's ablation Variant 2 removed the Router and forced extend-only refinement; both easy and hard task performance degraded, "demonstrat[ing] that it is more effective to correct mistakes in a plan than to keep adding potentially flawed steps" (Source: sources/2025-11-06-google-ds-star-versatile-data-science-agent).

Add-only loops accumulate mistakes; add-or-fix loops revise them.

When to reach for it

  • Open-ended problems without ground-truth labels. Data-science, analytics, exploratory investigations — where there is no single correct answer and verification must operate on plan shape rather than output match.
  • Heterogeneous inputs that need reasoning across multiple file formats. The pattern's Data File Analyzer pre-stage is pitched against heterogeneous data formats that schema-inspection primitives can't handle uniformly.
  • Multi-step reasoning where inspecting intermediate results is diagnostic for the full plan's viability — i.e. wherever an expert human would work incrementally in a notebook, reviewing each step's output before writing the next.

When not to reach for it

  • Well-defined single-shot tasks. If the problem has ground truth
  • a one-shot plan reliably works, the Verifier/Router overhead is wasted.
  • Strict latency budgets. Each loop iteration costs Planner + Coder + Verifier (+ Router on reject) inference. DS-STAR's 5.6 avg rounds on hard DABStep tasks = ~17-22 LLM calls per resolution.
  • Cost-sensitive batch jobs. Same reason — expensive per-task.

Key design decisions

  1. What does the Verifier judge on? In DS-STAR: plan sufficiency given the intermediate code execution results. Not a correctness check against an oracle — "determine if the current plan is adequate" (Source: sources/2025-11-06-google-ds-star-versatile-data-science-agent).
  2. What does the Router decide between? Minimum: add vs fix. A richer variant could add remove step or restart plan; DS-STAR sticks to add-or-fix.
  3. What is the loop budget? DS-STAR caps at 10 rounds; hard DABStep tasks average 5.6 rounds, easy tasks 3.0. See concepts/refinement-round-budget.
  4. What happens on budget exhaustion? DS-STAR returns the current (un-approved) plan's output as the final solution rather than erroring. Alternate designs could escalate or fail loud.
  5. Pre-loop context extractor? DS-STAR pairs the inner loop with the Data File Analyzer as a pre-stage; ablation shows it's load-bearing (45.2 % → 26.98 % without on DABStep hard).
  • patterns/specialized-agent-decompositionparent pattern. Per-domain agents in a network (Storex, Dash, AWS Strands). This Planner/Coder/Verifier/Router loop is the verification-gated inner-loop instance, where decomposition is by role in the refinement loop rather than by subject-matter domain.
  • patterns/snapshot-replay-agent-evaluation — eval-harness variant of LLM-as-judge; this pattern is the production-loop variant.
  • patterns/judge-query-context-tooling — the Verifier could be extended with retrieval tools for domain-specific judgment context, as Dropbox Dash did for its relevance judge.

Ablation-validated claims (DS-STAR, 2025-11-06)

Remove Impact on DABStep hard-task accuracy
Data File Analyzer (Variant 1) 45.2 % → 26.98 %
Router, forcing extend-only (Variant 2) Degrades both easy + hard (exact numbers in post's table image)

Both ablations isolate a structurally-load-bearing primitive of the pattern.

Seen in

Last updated · 200 distilled / 1,178 read