Skip to content

CONCEPT Cited by 1 source

Multi-LLM sub-agent routing

Multi-LLM sub-agent routing is an agent architecture in which different sub-agents inside a single agent system use different LLMs, each chosen for the specific sub-task — planning, search, code generation, judging — based on observed complementary capability profiles. The 2026-05-08 Databricks post on Genie coins this as a named architectural advance, alongside parallel thinking and specialised knowledge search.

The structural property that makes it possible: agent sub-tasks have complementary capability profiles that no single LLM optimises across, and the platform makes it cheap to swap models per sub-agent.

The architectural property

                 ┌──────────────────────────────────────────┐
                 │          User question / query           │
                 └──────────────────┬───────────────────────┘
                          Planning sub-agent
                          (LLM A — high-level reasoning)
                ┌───────────────────┼───────────────────┐
                ▼                   ▼                   ▼
       Search sub-agent     Code-gen sub-agent     Judge sub-agent
       (LLM B — fast        (LLM C — strong         (LLM D — high
        retrieval-tuned)     SQL synthesis)         precision eval)
                ▼                   ▼                   ▼
                └───────────────────┼───────────────────┘
                              Aggregator → Answer

Each box is a sub-agent. Each box independently picks the best LLM (commercial frontier, open-source, custom-trained) for its slice. The platform property "seamless to try out any of the frontier models" makes per-sub-agent assignment a tractable engineering decision rather than a research project.

Why no single LLM is best across all sub-tasks

The Databricks post observes: "different LLMs are good at complementary capabilities... different LLMs result in very different latency and cost characteristics." Concretely:

Sub-agent What it needs Some LLMs excel at
Planning Multi-step reasoning, decomposition, tool-call orchestration Frontier reasoning models (Opus, GPT, Gemini)
Search Fast pattern matching, schema understanding Smaller / faster models with retrieval tuning
Code generation Strong SQL synthesis, dialect awareness, schema grounding Code-specialised models or larger general models
Judging Calibrated quality assessment, accurate disagreement detection High-precision evaluator models

A single-LLM agent forces one model to do all four — paying the planner's reasoning cost on simple search calls, or the search's speed-tuning weakness on the planning call.

Three-axis simultaneous improvement

The post claims Multi-LLM (combined with parallel thinking and specialised knowledge search) drives:

Axis Direction
Accuracy ↑ (32% → >90% vs leading coding agent baseline)
Cost ↓ (significantly reduced)
Latency ↓ (significantly reduced)

This is unusual. The typical assumption is that more sophisticated agent design trades cost / latency for accuracy (more model calls = more cost). Multi-LLM beats this by:

  • Using expensive frontier models only where they pay off (planning, judging) — not across the whole pipeline.
  • Using fast, cheap, narrowly-tuned models for the high-volume sub-tasks (search, simple retrieval).
  • Combining with GEPA prompt optimization which closes accuracy gaps left by smaller / cheaper models on their assigned sub-tasks.

GEPA's role

The post explicitly references GEPA"the corresponding accuracy and cost can be further optimized using methods like GEPA" — on table-search sub-agents. GEPA is the prompt-optimisation method that closes the gap between "this LLM is best at this sub-task" and "this LLM with the best prompt is best at this sub-task." The combination of (a) per-sub-agent model selection + (b) per-sub-agent prompt optimisation is the shape that delivers the simultaneous improvement on all three axes.

Shape Distinguishing property
Multi-LLM sub-agent routing (this concept) Different LLMs for different sub-tasks within one agent system; per-sub-agent prompt optimisation
concepts/llm-cascade Same task, escalation chain — try cheap model first, escalate to expensive only on failure
concepts/multi-llm-debate (if it exists in wiki) Multiple LLMs argue the same task — adversarial / consensus seeking
Mixture-of-experts (model-internal) Within a single model, different experts activate per token; not multi-model
concepts/objective-abstraction (model-serving) Routing layer abstraction that lets clients pick a model — not internal sub-agent decomposition

The distinguishing axis: Multi-LLM sub-agent routing is internal to one agent's design, across its sub-tasks; the others operate at different altitudes (escalation, debate, model-internal, client-facing).

When this fits / doesn't

Fits:

  • Agent has clearly separable sub-tasks with different capability profiles.
  • Platform makes model swapping cheap (e.g., Databricks' AI Gateway, unified inference plane).
  • High-volume queries make the per-call cost optimisation worth the engineering investment.
  • Prompt-optimisation tooling available (GEPA or similar).

Doesn't fit:

  • Sub-tasks are too tightly coupled to separate cleanly.
  • No infrastructure for swapping models — every model swap is a multi-week deployment exercise.
  • Low-volume agent — the engineering cost of per-sub-agent tuning exceeds the cost saved.
  • Single LLM is dominantly best across all sub-tasks (rare in practice).

Seen in

  • sources/2026-05-08-databricks-pushing-the-frontier-for-data-agents-with-geniecanonical first wiki disclosure of Multi-LLM sub-agent routing as a named architectural advance. Genie uses different LLMs per sub-agent (planning / search / code-gen / judges); platform makes this seamless across Opus / GPT / Gemini / OSS / custom; combined with GEPA prompt optimisation, accuracy + cost + latency improve simultaneously (Figure 1 end-state). Positioned as the architectural response to the no-single-LLM-is-best-across-all-sub-tasks property.
Last updated · 542 distilled / 1,571 read