PATTERN Cited by 1 source

One model invocation per task¶

Intent¶

Replace a single mega-prompt trying to drive a complex multi-step process with a sequence of separate model invocations, each with:

a single well-defined purpose,
its own task-specific structured-output schema, and
its own (small, focused) prompt.

The application, not the prompt, owns sequencing, state, context propagation, and control flow.

Canonicalised by Slack's Security Engineering team as the central architectural response to their single-prompt prototype's "wildly variable" quality (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents).

Canonical statement¶

"Our solution was to break down the complex investigation process we'd described in the prompt of our prototype into a sequence of model invocations, each with a single, well-defined purpose and output structure. These simple tasks are chained together by our application."

(Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)

Mechanism¶

1. Identify the sub-tasks hiding in the mega-prompt¶

A mega-prompt like "You are a security analyst. Investigate this alert. First, gather evidence from all data sources. Then evaluate each finding. Then decide what to investigate next. Then produce a report." hides four sub-tasks:

Gather evidence (N tool calls, per-data-source).
Evaluate each finding (scoring pass).
Decide what to investigate next (phase progression).
Produce the report (synthesis).

Each deserves its own invocation.

2. Assign each sub-task its own schema¶

Each sub-task produces a structured output with a schema enforced at the invocation boundary:

Gather evidence: { findings: [{ source, observation, evidence_refs }] }
Evaluate findings: { scored_findings: [{ finding_id, credibility_score, notes }] }
Decide next step: { phase: "discovery|trace|conclude", rationale }
Produce report: { classification, summary, recommended_actions }

Schemas turn the prompt's implicit shape contracts into explicit, parseable artifacts. See concepts/structured-output-reliability.

3. Let the application code sequence them¶

The application runs:

findings = gather_evidence(question)
scored = evaluate_findings(findings)
phase = decide_next_phase(scored, journal)
if phase == "conclude":
    report = produce_report(journal + scored)
else:
    # back to gather_evidence with a new question
    ...

Control flow (what invocation happens next), state (journal), and error handling (retry on schema fail) all live in application code. The prompt is tactical; the app is strategic.

4. Keep per-task prompts small¶

Because each invocation has a single purpose, its prompt stays small — role definition, this-task guidance, maybe a few-shot example. Mega-prompt failure modes (context pollution, instruction fatigue, instruction ordering effects) disappear because the prompts were never that big to begin with.

Why it works¶

Addresses prompt-is-not-control. Control is in code, not in instruction bullets. See concepts/prompt-is-not-control.
Schemas catch structural failures early. If an invocation's output doesn't parse, it's a single-task retry, not a whole-investigation rewind.
Small prompts are more reliable. Instructions the model has to hold in mind for this task only, not across a 10-step methodology.
Per-task model/parameter selection. Each invocation can use a different model tier, token budget, tool surface. See concepts/knowledge-pyramid-model-tiering and patterns/phase-gated-investigation-progression.
Debuggable. Each invocation is a separate record: inputs, output, parse status, model identity, timing. Dashboards can replay any single invocation.

Costs Slack explicitly names¶

"Using structured outputs isn't 'free'; if the output format is too complicated for the model, the execution can fail. Structured outputs are also subject to the usual problems of cheating and hallucination."

(Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)

The trade is worth it because each schema is small enough to stay within the model's competence. The design move is: take the one complex schema the mega-prompt was implicitly trying to fulfil and split it into N small schemas, one per invocation.

When to reach for it¶

Task has a natural multi-step shape the mega-prompt is trying to encode as narrative.
Behaviour is non-deterministic across runs. Same input, different sequences of steps — a signature of control-in- prompt.
Prompt tuning has stopped producing monotonic gains. Adding instructions no longer fixes failures; new failures appear in previously-working paths.
You need different model tiers / parameters at different steps. Impossible with one invocation.
You need to debug / replay individual decisions. One invocation = one opaque blob; separate invocations = replay granularity.

When not to reach for it¶

Task genuinely fits in one step. Summarise this paragraph. Translate this sentence. Don't shatter single- task work into multiple invocations.
Latency budget is razor-thin. N invocations in series have N times the model-call latency of one invocation. Mitigatable with parallelism on fan-out steps, but still a cost.

Composes with¶

patterns/director-expert-critic-investigation-loop — natural apex pattern that uses per-task invocations across three personas + phases.
patterns/phase-gated-investigation-progression — phase transitions become their own invocation (the meta- phase).
patterns/specialized-agent-decomposition — specialised agents are the per-domain sibling; one- invocation-per-task is the per-step sibling. Both reject mega-prompt architectures.
patterns/tool-call-loop-minimal-agent — within a single task invocation, a tool-call loop is fine; the pattern governs task boundaries, not within-task tool use.

Contrasts¶

vs. ReAct / mega-prompt agents — ReAct-style agents encode think → act → observe in one prompt with many turns. One-invocation-per-task pulls the planning ("think") and the outcome-consumption ("observe + next step") into separate invocations.
vs. patterns/context-segregated-sub-agents — sub- agents have separate context windows because their tasks differ structurally; one-invocation-per-task applies even within a single agent role to decompose its work.

Tradeoffs¶

More application code — sequencing, state propagation, schema validation, error handling all live in code.
Per-invocation latency — more serial model calls; can be mitigated by parallelising fan-out steps.
Schema-design effort — each task needs a schema that's complete enough to drive the next step and simple enough for the model to reliably produce.
Harder to reason about end-to-end behaviour — one mega- prompt has one failure mode (the prompt is wrong); N invocations have N failure modes (any one can fail). Mitigated by per-invocation logging + replay.

Seen in¶

systems/slack-spear — canonical first wiki instance. The 300-word single-prompt prototype's "wildly variable" output quality drove the rewrite into sequences of per-task invocations across Director, four Experts, and a Critic, each with task-specific structured-output schemas. (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)