PATTERN Cited by 1 source
LLM per sub-agent with optimized prompts¶
LLM per sub-agent with optimised prompts is an agent-design pattern in which a single agent system uses different LLMs for different internal sub-agents (planning, search, code generation, judging), each with per-sub-agent prompt optimisation (e.g., via GEPA). The pattern is the structural mechanism through which Multi-LLM sub-agent routing delivers simultaneous improvement on accuracy + cost + latency.
Canonicalised in the 2026-05-08 Databricks post on Genie as one of the three architectural advances enabling Genie's accuracy lead over a "leading coding agent" baseline (32% → over 90% on Databricks' internal benchmark).
The pattern¶
Agent system
┌─────────────────────────────────────────────────────────┐
│ │
│ ┌───────────────────────┐ │
│ │ Planning sub-agent │◄── LLM A + prompt P_A │
│ └──────────┬────────────┘ (frontier reasoning) │
│ │ │
│ ┌──────────▼─────────┐ │
│ │ Search sub-agent │◄── LLM B + prompt P_B │
│ └──────────┬─────────┘ (fast retrieval-tuned) │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Code-gen sub-agent │◄── LLM C + prompt P_C │
│ └──────────┬──────────┘ (SQL synthesis) │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Judge sub-agent │◄── LLM D + prompt P_D │
│ └─────────────────────┘ (quality evaluation) │
│ │
└────────────────────────────────────────────────────────────┘
Each prompt P_i is GEPA-optimised for its (LLM, sub-task) pair.
The two distinct moves:
- Per-sub-agent LLM assignment — pick best-of-class for each slice of the agent's work.
- Per-sub-agent prompt optimisation — for the chosen LLM and its sub-task, optimise the prompt (e.g., with GEPA) so that smaller / cheaper models can recover accuracy that frontier models would have provided with a generic prompt.
Why both moves are necessary¶
| Move | Without the other | With both |
|---|---|---|
| Multi-LLM only (no prompt opt) | Smaller models on cheap sub-tasks underperform; gain is small | n/a |
| Prompt opt only (single LLM) | Stuck with one model's capability profile across all sub-tasks | n/a |
| Both combined | n/a | Each (LLM, sub-task) pair runs at its optimised operating point — accuracy + cost + latency all gain |
The Databricks post explicitly references this combination as the mechanism: "different LLMs perform on table search tasks and how the corresponding accuracy and cost can be further optimized using methods like GEPA."
Sub-agent decomposition¶
The pattern requires identifying clearly separable sub-tasks with distinct capability profiles. Genie's disclosed decomposition:
| Sub-agent | Capability needed | Volume per query | Cost sensitivity |
|---|---|---|---|
| Planning | Multi-step reasoning, tool-call orchestration | Low (1 / query) | Low (one expensive call OK) |
| Search | Asset retrieval / matching | High (many calls) | High (per-call cost matters) |
| Code generation | SQL synthesis, schema grounding | Medium | Medium |
| Judges | Calibrated quality evaluation | Medium-low (1 per N trajectories) | Low |
The pattern's value comes from mismatched profiles — if all sub-agents had identical profiles, single-LLM would tie. The complementary capabilities observation is what makes the pattern worth the engineering investment.
Disclosed example: Table search¶
Figure 6 of the source post specifically shows table-search sub-agents running on different LLMs, with GEPA optimising the corresponding prompts. Disclosed property: "how different LLMs perform on table search tasks and how the corresponding accuracy and cost can be further optimized using methods like GEPA." Specific numbers not disclosed.
Composition with parallel thinking¶
| Without parallel thinking | With parallel thinking |
|---|---|
| One trajectory, multi-LLM per sub-agent | N trajectories, each with multi-LLM per sub-agent |
| Single sample per sub-task | Multiple samples; aggregator picks |
These compose naturally — Genie does both. Multi-LLM per sub-agent + parallel trajectory sampling is double diversity: across trajectory boundaries (sampling) and across sub-agent boundaries (model variety).
Operationalising: what infrastructure is needed¶
| Component | Purpose |
|---|---|
| Unified inference plane | Allow any LLM to be invoked from any sub-agent without per-model integration cost |
| Prompt versioning + management | Each (LLM, sub-task) pair has its own optimised prompt; manage as code |
| Prompt optimisation tooling | GEPA or equivalent — feedback loop on prompt quality |
| Per-sub-agent telemetry | Measure accuracy + cost + latency at the sub-agent altitude (not just end-to-end) — the engineering decision needs per-slice data |
| Cost guardrails | Frontier models on planning sub-agent can be expensive; need per-call budget controls |
Databricks' platform property "seamless to try out any of the frontier models (including Opus, GPT, and Gemini), open-source models, as well as custom trained models" is what makes the pattern tractable; without that the per-sub-agent assignment is a multi-week-per-swap exercise.
When this fits / doesn't¶
Fits:
- Agent has clearly separable sub-tasks with different capability profiles.
- Inference platform makes per-model swapping cheap.
- Prompt-optimisation tooling available.
- High-volume agent — engineering investment amortised.
- Clear telemetry per sub-agent for tuning.
Doesn't fit:
- Sub-tasks are too tightly coupled to separate cleanly.
- No infrastructure for swapping models — every swap is a major deployment.
- Low-volume agent — engineering cost > savings.
- Single LLM is dominantly best across all sub-tasks (rare in practice).
Anti-patterns¶
- Pick best per sub-agent, no prompt optimisation — leaves significant accuracy on the table; smaller models underperform on generic prompts.
- Optimise prompts only on flagship LLM — fails to adapt prompts to the smaller / faster model assigned to high-volume sub-tasks.
- Same prompt across LLMs — different models respond differently to the same prompt; what's optimal for Opus isn't for Gemini.
- No per-sub-agent telemetry — can't tell where the bottleneck is; blind tuning.
- Frontier model on every sub-agent — defeats the cost benefit; reserve frontier for sub-tasks that pay off (planning, judging).
Relationship to related patterns¶
- patterns/parallel-trajectory-sampling-and-aggregation — composes naturally; parallel trajectories each use the multi-LLM per-sub-agent pattern.
- patterns/four-phase-data-agent-trajectory — each phase has different LLM assignments per the multi-LLM pattern.
- concepts/multi-llm-sub-agent-routing — the underlying concept; this is the operational shape.
Seen in¶
- sources/2026-05-08-databricks-pushing-the-frontier-for-data-agents-with-genie — canonical first wiki disclosure of LLM-per-sub-agent + GEPA- optimised-prompts as a named pattern. Genie uses different LLMs for planning / search / code-gen / judge sub-agents; GEPA optimises the corresponding prompts; combined effect is simultaneous improvement on accuracy + cost + latency (Figure 1 end-state). Specific (LLM, sub-task) assignments not disclosed publicly.