Skip to content

SYSTEM Cited by 1 source

AutoGen

AutoGen is Microsoft Research's open-source multi-agent conversation framework for LLMs (arXiv 2308.08155). On this wiki it appears as a named baseline for data-science-agent benchmarks (DABStep, KramaBench, DA-Code) against which Google's DS-STAR reports state-of-the-art improvements (Source: sources/2025-11-06-google-ds-star-versatile-data-science-agent).

Why it appears here

The DS-STAR paper/blog positions AutoGen (alongside DA-Agent) as one of the "state-of-the-art methods" that DS-STAR substantially outperforms. The specific gap on DABStep is 41.0 % (best prior baseline) → 45.2 % (DS-STAR), with similar-shaped gaps on KramaBench and DA-Code. The DS-STAR post does not decompose which of AutoGen vs DA-Agent held the "best prior" position on each benchmark.

Framing

AutoGen is relevant as a comparative multi-agent baseline — it demonstrates that the "multiple specialised LLM sub-agents" shape is not unique to DS-STAR. What DS-STAR adds over the AutoGen baseline is:

Caveats

  • Minimal-page stub. This wiki page is a comparison-anchor only; AutoGen's own architecture, collaboration protocols, and production use are in the framework's own docs / paper, not in any currently- ingested source on this wiki.
  • No independent AutoGen-source ingestion yet. Benchmark numbers quoted here are from the DS-STAR post, not from an AutoGen-authored source.

Seen in

Last updated · 200 distilled / 1,178 read