CONCEPT Cited by 1 source

Multi-LLM sub-agent routing¶

Multi-LLM sub-agent routing is an agent architecture in which different sub-agents inside a single agent system use different LLMs, each chosen for the specific sub-task — planning, search, code generation, judging — based on observed complementary capability profiles. The 2026-05-08 Databricks post on Genie coins this as a named architectural advance, alongside parallel thinking and specialised knowledge search.

The structural property that makes it possible: agent sub-tasks have complementary capability profiles that no single LLM optimises across, and the platform makes it cheap to swap models per sub-agent.

The architectural property¶

                 ┌──────────────────────────────────────────┐
                 │          User question / query           │
                 └──────────────────┬───────────────────────┘
                                    ▼
                          Planning sub-agent
                          (LLM A — high-level reasoning)
                                    │
                ┌───────────────────┼───────────────────┐
                ▼                   ▼                   ▼
       Search sub-agent     Code-gen sub-agent     Judge sub-agent
       (LLM B — fast        (LLM C — strong         (LLM D — high
        retrieval-tuned)     SQL synthesis)         precision eval)
                ▼                   ▼                   ▼
                └───────────────────┼───────────────────┘
                                    ▼
                              Aggregator → Answer

Each box is a sub-agent. Each box independently picks the best LLM (commercial frontier, open-source, custom-trained) for its slice. The platform property "seamless to try out any of the frontier models" makes per-sub-agent assignment a tractable engineering decision rather than a research project.

Why no single LLM is best across all sub-tasks¶

The Databricks post observes: "different LLMs are good at complementary capabilities... different LLMs result in very different latency and cost characteristics." Concretely:

Sub-agent	What it needs	Some LLMs excel at
Planning	Multi-step reasoning, decomposition, tool-call orchestration	Frontier reasoning models (Opus, GPT, Gemini)
Search	Fast pattern matching, schema understanding	Smaller / faster models with retrieval tuning
Code generation	Strong SQL synthesis, dialect awareness, schema grounding	Code-specialised models or larger general models
Judging	Calibrated quality assessment, accurate disagreement detection	High-precision evaluator models

A single-LLM agent forces one model to do all four — paying the planner's reasoning cost on simple search calls, or the search's speed-tuning weakness on the planning call.

Three-axis simultaneous improvement¶

The post claims Multi-LLM (combined with parallel thinking and specialised knowledge search) drives:

Axis	Direction
Accuracy	↑ (32% → >90% vs leading coding agent baseline)
Cost	↓ (significantly reduced)
Latency	↓ (significantly reduced)

This is unusual. The typical assumption is that more sophisticated agent design trades cost / latency for accuracy (more model calls = more cost). Multi-LLM beats this by:

Using expensive frontier models only where they pay off (planning, judging) — not across the whole pipeline.
Using fast, cheap, narrowly-tuned models for the high-volume sub-tasks (search, simple retrieval).
Combining with GEPA prompt optimization which closes accuracy gaps left by smaller / cheaper models on their assigned sub-tasks.

GEPA's role¶

The post explicitly references GEPA — "the corresponding accuracy and cost can be further optimized using methods like GEPA" — on table-search sub-agents. GEPA is the prompt-optimisation method that closes the gap between "this LLM is best at this sub-task" and "this LLM with the best prompt is best at this sub-task." The combination of (a) per-sub-agent model selection + (b) per-sub-agent prompt optimisation is the shape that delivers the simultaneous improvement on all three axes.

Shape	Distinguishing property
Multi-LLM sub-agent routing (this concept)	Different LLMs for different sub-tasks within one agent system; per-sub-agent prompt optimisation
concepts/llm-cascade	Same task, escalation chain — try cheap model first, escalate to expensive only on failure
concepts/multi-llm-debate (if it exists in wiki)	Multiple LLMs argue the same task — adversarial / consensus seeking
Mixture-of-experts (model-internal)	Within a single model, different experts activate per token; not multi-model
concepts/objective-abstraction (model-serving)	Routing layer abstraction that lets clients pick a model — not internal sub-agent decomposition

The distinguishing axis: Multi-LLM sub-agent routing is internal to one agent's design, across its sub-tasks; the others operate at different altitudes (escalation, debate, model-internal, client-facing).

When this fits / doesn't¶

Fits:

Agent has clearly separable sub-tasks with different capability profiles.
Platform makes model swapping cheap (e.g., Databricks' AI Gateway, unified inference plane).
High-volume queries make the per-call cost optimisation worth the engineering investment.
Prompt-optimisation tooling available (GEPA or similar).

Doesn't fit:

Sub-tasks are too tightly coupled to separate cleanly.
No infrastructure for swapping models — every model swap is a multi-week deployment exercise.
Low-volume agent — the engineering cost of per-sub-agent tuning exceeds the cost saved.
Single LLM is dominantly best across all sub-tasks (rare in practice).

concepts/data-agent-unique-challenges is the problem class driving the search for accuracy gains.
concepts/parallel-thinking-trajectory-sampling introduces cost; Multi-LLM recovers it — they compose.
systems/gepa-prompt-optimizer is the per-sub-agent prompt optimisation tool referenced.
patterns/llm-per-subagent-with-optimized-prompts is the pattern that operationalises this concept.

Seen in¶

sources/2026-05-08-databricks-pushing-the-frontier-for-data-agents-with-genie — canonical first wiki disclosure of Multi-LLM sub-agent routing as a named architectural advance. Genie uses different LLMs per sub-agent (planning / search / code-gen / judges); platform makes this seamless across Opus / GPT / Gemini / OSS / custom; combined with GEPA prompt optimisation, accuracy + cost + latency improve simultaneously (Figure 1 end-state). Positioned as the architectural response to the no-single-LLM-is-best-across-all-sub-tasks property.