PATTERN Cited by 1 source

LLM per sub-agent with optimized prompts¶

LLM per sub-agent with optimised prompts is an agent-design pattern in which a single agent system uses different LLMs for different internal sub-agents (planning, search, code generation, judging), each with per-sub-agent prompt optimisation (e.g., via GEPA). The pattern is the structural mechanism through which Multi-LLM sub-agent routing delivers simultaneous improvement on accuracy + cost + latency.

Canonicalised in the 2026-05-08 Databricks post on Genie as one of the three architectural advances enabling Genie's accuracy lead over a "leading coding agent" baseline (32% → over 90% on Databricks' internal benchmark).

The pattern¶

                       Agent system
  ┌─────────────────────────────────────────────────────────┐
  │                                                          │
  │   ┌───────────────────────┐                              │
  │   │  Planning sub-agent   │◄── LLM A + prompt P_A        │
  │   └──────────┬────────────┘    (frontier reasoning)      │
  │              │                                            │
  │   ┌──────────▼─────────┐                                  │
  │   │  Search sub-agent  │◄── LLM B + prompt P_B            │
  │   └──────────┬─────────┘    (fast retrieval-tuned)        │
  │              │                                            │
  │   ┌──────────▼──────────┐                                 │
  │   │ Code-gen sub-agent  │◄── LLM C + prompt P_C           │
  │   └──────────┬──────────┘    (SQL synthesis)              │
  │              │                                            │
  │   ┌──────────▼──────────┐                                 │
  │   │   Judge sub-agent   │◄── LLM D + prompt P_D           │
  │   └─────────────────────┘    (quality evaluation)         │
  │                                                            │
  └────────────────────────────────────────────────────────────┘

  Each prompt P_i is GEPA-optimised for its (LLM, sub-task) pair.

The two distinct moves:

Per-sub-agent LLM assignment — pick best-of-class for each slice of the agent's work.
Per-sub-agent prompt optimisation — for the chosen LLM and its sub-task, optimise the prompt (e.g., with GEPA) so that smaller / cheaper models can recover accuracy that frontier models would have provided with a generic prompt.

Why both moves are necessary¶

Move	Without the other	With both
Multi-LLM only (no prompt opt)	Smaller models on cheap sub-tasks underperform; gain is small	n/a
Prompt opt only (single LLM)	Stuck with one model's capability profile across all sub-tasks	n/a
Both combined	n/a	Each (LLM, sub-task) pair runs at its optimised operating point — accuracy + cost + latency all gain

The Databricks post explicitly references this combination as the mechanism: "different LLMs perform on table search tasks and how the corresponding accuracy and cost can be further optimized using methods like GEPA."

Sub-agent decomposition¶

The pattern requires identifying clearly separable sub-tasks with distinct capability profiles. Genie's disclosed decomposition:

Sub-agent	Capability needed	Volume per query	Cost sensitivity
Planning	Multi-step reasoning, tool-call orchestration	Low (1 / query)	Low (one expensive call OK)
Search	Asset retrieval / matching	High (many calls)	High (per-call cost matters)
Code generation	SQL synthesis, schema grounding	Medium	Medium
Judges	Calibrated quality evaluation	Medium-low (1 per N trajectories)	Low

The pattern's value comes from mismatched profiles — if all sub-agents had identical profiles, single-LLM would tie. The complementary capabilities observation is what makes the pattern worth the engineering investment.

Disclosed example: Table search¶

Figure 6 of the source post specifically shows table-search sub-agents running on different LLMs, with GEPA optimising the corresponding prompts. Disclosed property: "how different LLMs perform on table search tasks and how the corresponding accuracy and cost can be further optimized using methods like GEPA." Specific numbers not disclosed.

Composition with parallel thinking¶

Without parallel thinking	With parallel thinking
One trajectory, multi-LLM per sub-agent	N trajectories, each with multi-LLM per sub-agent
Single sample per sub-task	Multiple samples; aggregator picks

These compose naturally — Genie does both. Multi-LLM per sub-agent + parallel trajectory sampling is double diversity: across trajectory boundaries (sampling) and across sub-agent boundaries (model variety).

Operationalising: what infrastructure is needed¶

Component	Purpose
Unified inference plane	Allow any LLM to be invoked from any sub-agent without per-model integration cost
Prompt versioning + management	Each (LLM, sub-task) pair has its own optimised prompt; manage as code
Prompt optimisation tooling	GEPA or equivalent — feedback loop on prompt quality
Per-sub-agent telemetry	Measure accuracy + cost + latency at the sub-agent altitude (not just end-to-end) — the engineering decision needs per-slice data
Cost guardrails	Frontier models on planning sub-agent can be expensive; need per-call budget controls

Databricks' platform property "seamless to try out any of the frontier models (including Opus, GPT, and Gemini), open-source models, as well as custom trained models" is what makes the pattern tractable; without that the per-sub-agent assignment is a multi-week-per-swap exercise.

When this fits / doesn't¶

Fits:

Agent has clearly separable sub-tasks with different capability profiles.
Inference platform makes per-model swapping cheap.
Prompt-optimisation tooling available.
High-volume agent — engineering investment amortised.
Clear telemetry per sub-agent for tuning.

Doesn't fit:

Sub-tasks are too tightly coupled to separate cleanly.
No infrastructure for swapping models — every swap is a major deployment.
Low-volume agent — engineering cost > savings.
Single LLM is dominantly best across all sub-tasks (rare in practice).

Anti-patterns¶

Pick best per sub-agent, no prompt optimisation — leaves significant accuracy on the table; smaller models underperform on generic prompts.
Optimise prompts only on flagship LLM — fails to adapt prompts to the smaller / faster model assigned to high-volume sub-tasks.
Same prompt across LLMs — different models respond differently to the same prompt; what's optimal for Opus isn't for Gemini.
No per-sub-agent telemetry — can't tell where the bottleneck is; blind tuning.
Frontier model on every sub-agent — defeats the cost benefit; reserve frontier for sub-tasks that pay off (planning, judging).

patterns/parallel-trajectory-sampling-and-aggregation — composes naturally; parallel trajectories each use the multi-LLM per-sub-agent pattern.
patterns/four-phase-data-agent-trajectory — each phase has different LLM assignments per the multi-LLM pattern.
concepts/multi-llm-sub-agent-routing — the underlying concept; this is the operational shape.

Seen in¶

sources/2026-05-08-databricks-pushing-the-frontier-for-data-agents-with-genie — canonical first wiki disclosure of LLM-per-sub-agent + GEPA- optimised-prompts as a named pattern. Genie uses different LLMs for planning / search / code-gen / judge sub-agents; GEPA optimises the corresponding prompts; combined effect is simultaneous improvement on accuracy + cost + latency (Figure 1 end-state). Specific (LLM, sub-task) assignments not disclosed publicly.