SYSTEM Cited by 1 source
AutoGen¶
AutoGen is Microsoft Research's open-source multi-agent conversation framework for LLMs (arXiv 2308.08155). On this wiki it appears as a named baseline for data-science-agent benchmarks (DABStep, KramaBench, DA-Code) against which Google's DS-STAR reports state-of-the-art improvements (Source: sources/2025-11-06-google-ds-star-versatile-data-science-agent).
Why it appears here¶
The DS-STAR paper/blog positions AutoGen (alongside DA-Agent) as one of the "state-of-the-art methods" that DS-STAR substantially outperforms. The specific gap on DABStep is 41.0 % (best prior baseline) → 45.2 % (DS-STAR), with similar-shaped gaps on KramaBench and DA-Code. The DS-STAR post does not decompose which of AutoGen vs DA-Agent held the "best prior" position on each benchmark.
Framing¶
AutoGen is relevant as a comparative multi-agent baseline — it demonstrates that the "multiple specialised LLM sub-agents" shape is not unique to DS-STAR. What DS-STAR adds over the AutoGen baseline is:
- Up-front Data File Analyzer step.
- Inner-loop LLM judge (Verifier) on plans.
- Explicit add-or-fix Router decision rather than extend-only plan growth.
- The full patterns/planner-coder-verifier-router-loop shape.
Caveats¶
- Minimal-page stub. This wiki page is a comparison-anchor only; AutoGen's own architecture, collaboration protocols, and production use are in the framework's own docs / paper, not in any currently- ingested source on this wiki.
- No independent AutoGen-source ingestion yet. Benchmark numbers quoted here are from the DS-STAR post, not from an AutoGen-authored source.
Seen in¶
- sources/2025-11-06-google-ds-star-versatile-data-science-agent — named baseline comparator.
Related¶
- systems/ds-star — the agent that outperforms AutoGen on the three named benchmarks.
- systems/dabstep — benchmark surface.
- patterns/specialized-agent-decomposition — broader shared pattern family across multi-agent LLM frameworks.