PATTERN Cited by 1 source
Planner / Coder / Verifier / Router loop¶
Intent¶
Build an agent that solves open-ended problems by planning, executing, judging, and refining a plan iteratively, with an explicit add-or-fix decision when the judge rejects. Four specialised LLM sub-agents plus a router, wired as a verification-gated inner loop.
Shape¶
[Data File Analyzer] ← pre-loop context extraction
│
▼
┌───► Planner ← high-level plan
│ │
│ ▼
│ Coder ← plan → executable code → run
│ │
│ ▼
│ Verifier ─── pass ──► return solution
│ │
│ fail
│ ▼
│ Router
│ / \
│ "add" "fix"
│ │ │
└────┴────────┘ ← revised plan; repeat
(bounded by N rounds)
The five agents in DS-STAR's canonical instance:
- Data File Analyzer — pre-loop; writes Python to summarise every file in the working directory. See concepts/data-file-analysis.
- Planner — produces a step-sequenced plan.
- Coder — turns plan into code, executes it, captures intermediate results.
- Verifier — LLM judge scoring plan sufficiency against current results.
- Router — on Verifier reject, chooses add new step vs fix existing step; emits the refined plan.
Distinguishing primitive: the add-or-fix branch¶
The Router's add-vs-fix choice is what makes this pattern different from extend-only agent loops. DS-STAR's ablation Variant 2 removed the Router and forced extend-only refinement; both easy and hard task performance degraded, "demonstrat[ing] that it is more effective to correct mistakes in a plan than to keep adding potentially flawed steps" (Source: sources/2025-11-06-google-ds-star-versatile-data-science-agent).
Add-only loops accumulate mistakes; add-or-fix loops revise them.
When to reach for it¶
- Open-ended problems without ground-truth labels. Data-science, analytics, exploratory investigations — where there is no single correct answer and verification must operate on plan shape rather than output match.
- Heterogeneous inputs that need reasoning across multiple file formats. The pattern's Data File Analyzer pre-stage is pitched against heterogeneous data formats that schema-inspection primitives can't handle uniformly.
- Multi-step reasoning where inspecting intermediate results is diagnostic for the full plan's viability — i.e. wherever an expert human would work incrementally in a notebook, reviewing each step's output before writing the next.
When not to reach for it¶
- Well-defined single-shot tasks. If the problem has ground truth
- a one-shot plan reliably works, the Verifier/Router overhead is wasted.
- Strict latency budgets. Each loop iteration costs Planner + Coder + Verifier (+ Router on reject) inference. DS-STAR's 5.6 avg rounds on hard DABStep tasks = ~17-22 LLM calls per resolution.
- Cost-sensitive batch jobs. Same reason — expensive per-task.
Key design decisions¶
- What does the Verifier judge on? In DS-STAR: plan sufficiency given the intermediate code execution results. Not a correctness check against an oracle — "determine if the current plan is adequate" (Source: sources/2025-11-06-google-ds-star-versatile-data-science-agent).
- What does the Router decide between? Minimum: add vs fix. A richer variant could add remove step or restart plan; DS-STAR sticks to add-or-fix.
- What is the loop budget? DS-STAR caps at 10 rounds; hard DABStep tasks average 5.6 rounds, easy tasks 3.0. See concepts/refinement-round-budget.
- What happens on budget exhaustion? DS-STAR returns the current (un-approved) plan's output as the final solution rather than erroring. Alternate designs could escalate or fail loud.
- Pre-loop context extractor? DS-STAR pairs the inner loop with the Data File Analyzer as a pre-stage; ablation shows it's load-bearing (45.2 % → 26.98 % without on DABStep hard).
Related patterns¶
- patterns/specialized-agent-decomposition — parent pattern. Per-domain agents in a network (Storex, Dash, AWS Strands). This Planner/Coder/Verifier/Router loop is the verification-gated inner-loop instance, where decomposition is by role in the refinement loop rather than by subject-matter domain.
- patterns/snapshot-replay-agent-evaluation — eval-harness variant of LLM-as-judge; this pattern is the production-loop variant.
- patterns/judge-query-context-tooling — the Verifier could be extended with retrieval tools for domain-specific judgment context, as Dropbox Dash did for its relevance judge.
Ablation-validated claims (DS-STAR, 2025-11-06)¶
| Remove | Impact on DABStep hard-task accuracy |
|---|---|
| Data File Analyzer (Variant 1) | 45.2 % → 26.98 % |
| Router, forcing extend-only (Variant 2) | Degrades both easy + hard (exact numbers in post's table image) |
Both ablations isolate a structurally-load-bearing primitive of the pattern.
Seen in¶
- sources/2025-11-06-google-ds-star-versatile-data-science-agent — canonical wiki instance. DS-STAR's 4-agent-plus-router loop reaches #1 on DABStep, sets state-of-the-art on KramaBench and DA-Code, and experimentally isolates the Data File Analyzer + Router add-or-fix decision as load-bearing.
Related¶
- systems/ds-star — the canonical system instance.
- concepts/iterative-plan-refinement — the loop-level concept this pattern implements.
- concepts/llm-as-judge — the Verifier's primitive.
- concepts/refinement-round-budget — the termination discipline.
- concepts/data-file-analysis — the load-bearing pre-loop primitive.
- patterns/specialized-agent-decomposition — parent pattern.