SYSTEM Cited by 1 source

AutoGen¶

AutoGen is Microsoft Research's open-source multi-agent conversation framework for LLMs (arXiv 2308.08155). On this wiki it appears as a named baseline for data-science-agent benchmarks (DABStep, KramaBench, DA-Code) against which Google's DS-STAR reports state-of-the-art improvements (Source: sources/2025-11-06-google-ds-star-versatile-data-science-agent).

Why it appears here¶

The DS-STAR paper/blog positions AutoGen (alongside DA-Agent) as one of the "state-of-the-art methods" that DS-STAR substantially outperforms. The specific gap on DABStep is 41.0 % (best prior baseline) → 45.2 % (DS-STAR), with similar-shaped gaps on KramaBench and DA-Code. The DS-STAR post does not decompose which of AutoGen vs DA-Agent held the "best prior" position on each benchmark.

Framing¶

AutoGen is relevant as a comparative multi-agent baseline — it demonstrates that the "multiple specialised LLM sub-agents" shape is not unique to DS-STAR. What DS-STAR adds over the AutoGen baseline is:

Up-front Data File Analyzer step.
Inner-loop LLM judge (Verifier) on plans.
Explicit add-or-fix Router decision rather than extend-only plan growth.
The full patterns/planner-coder-verifier-router-loop shape.

Caveats¶

Minimal-page stub. This wiki page is a comparison-anchor only; AutoGen's own architecture, collaboration protocols, and production use are in the framework's own docs / paper, not in any currently- ingested source on this wiki.
No independent AutoGen-source ingestion yet. Benchmark numbers quoted here are from the DS-STAR post, not from an AutoGen-authored source.

Seen in¶

sources/2025-11-06-google-ds-star-versatile-data-science-agent — named baseline comparator.

systems/ds-star — the agent that outperforms AutoGen on the three named benchmarks.
systems/dabstep — benchmark surface.
patterns/specialized-agent-decomposition — broader shared pattern family across multi-agent LLM frameworks.

AutoGen¶

Why it appears here¶

Framing¶

Caveats¶

Seen in¶

Related¶