PATTERN Cited by 1 source
Specialized workflow router with LLM intent detection¶
Pattern shape¶
An LLM classifies inbound user requests into one of N pre-defined workflows; each workflow has its own specialised handler independently designed to do the work. The handlers need not all use the LLM — they can be deterministic UIs, form-flow guides, templated canned responses, RAG-driven LLM generation, or any mix. Workflow buckets are picked along two axes: frequency of inbound requests AND risk class (legal, financial, churn, safety). High-risk request shapes get deterministic / templated handlers that bypass LLM generation entirely.
Structure¶
┌────────────────────────┐
│ User query │
└──────────┬─────────────┘
│
▼
┌────────────────────────┐
│ LLM intent router │ ← classifies into 1 of N workflows
└──────────┬─────────────┘
│
┌───────┴───────┬───────────┬───────────┬───────────┐
▼ ▼ ▼ ▼ ▼
┌──────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ ┌─────────┐
│ QA │ │ Billing │ │ Refund │ │ Cancel │ │ Review │
│ RAG │ │ Det. UI │ │ Form │ │ Template│ │ Template│
│ LLM │ │ no-LLM │ │ guide │ │ no-LLM │ │ no-LLM │
└──────┘ └──────────┘ └──────────┘ └─────────┘ └─────────┘
│ │ │ │ │
│ │ │ │ │
│ │ ▼ ▼ ▼
│ │ ┌────────────────────────────────┐
│ ├──────▶│ Output to user │
│ │ │ (Billing / Form / Templated) │
│ │ └────────────────────────────────┘
▼ ▼
┌─────────────────────────┐
│ Validation gate │ ← LLM-generated content only
│ (T&S / URL / char-limit)│
└──────────┬──────────────┘
▼
┌─────────────────────────┐
│ Output to user │
└─────────────────────────┘
Canonical instance — Yelp CS Chatbot (2026-05-27)¶
Yelp's LLM-Assisted Customer Success Chatbot routes inbound queries into five named workflows:
| Workflow | Handler shape | Risk class |
|---|---|---|
| Question/Answering (QA) | RAG-driven LLM generation | Standard |
| Billing | Deterministic UI (subscribed services + balances) | Low |
| Refund | Form-submission guide | Standard |
| Cancel | Templated response (no LLM generation) | High financial / legal |
| Review | Templated response (no LLM generation) | High financial / legal |
Only ONE of the five workflows (QA) has the LLM actually generate free-form text. Verbatim from the post:
"We bucketed the workflows based on the frequency of inbound requests along with the potential risks of the queries (e.g. churn risk, legal risk, and financial risk). When a user submits a query, the system uses the LLM to intelligently detect which workflow the query should follow." (Source: sources/2026-05-27-yelp-beyond-the-menu-tree-how-yelp-built-a-smarter-customer-success-chatbot)
A/B-test outcome vs the legacy menu-tree+fixed-phrase chatbot: doubled resolution rate.
Three structural pieces¶
- LLM intent classifier — short prompt asking the LLM to pick one of N workflows for a given user query. Prompt includes workflow names, brief descriptions, and ideally a default fallback.
- Workflow-specific handlers — one per workflow. Independently designed; some are deterministic, some template-driven, some RAG-driven. Handlers don't share prompts, error handling, or output format constraints.
- Workflow-specific output gate — applies only when the handler itself uses LLM generation. Templated / deterministic handlers skip the gate. Yelp's QA gate is three-axis: trust & safety / valid URL / character limit.
When to apply¶
Use this pattern when:
- Inbound queries span distinct request shapes with different cost / risk / SLA properties.
- Some request shapes have legal / financial / churn risk that makes LLM generation unsafe; bypassing LLM generation for those is desirable.
- Some request shapes are deterministic (Billing details, account balances) — LLM-generated answers there add risk without value.
- The LLM's classification capability is more reliable than its generation capability for the domain.
Don't use this pattern when:
- Queries are uniformly free-form and can't be cleanly bucketed.
- Handler diversity adds operational overhead that exceeds the per-workflow specialisation benefit.
- Workflow-mis-routing cost is higher than monolithic-LLM generation cost (rare, but possible in adversarial domains).
Trade-offs¶
- Operational complexity ↑ — N handlers to maintain. Mitigated by handler independence (changes to one don't regress others).
- Latency ↑ — every query pays an LLM-classification cost before the handler runs. Mitigated by using a small fast model (e.g. GPT-4o-mini) for the router and reserving larger models for QA generation.
- Risk surface area ↓↓ — high-risk workflows entirely bypass LLM generation. The blast radius of an LLM hallucination is bounded to the QA workflow only.
- Engineering velocity per handler ↑ — handler teams own their slice; can iterate independently.
- User experience — uniform front (chat interface) over heterogeneous back (handlers). User doesn't know the request was routed.
Risks¶
- Mis-routing. Router classifies Cancel intent as QA; QA-RAG generates an LLM response that should have been a templated risk-mitigation message. Mitigation: conservative router prompts; bias toward routing into safer/templated workflows when uncertain.
- Coverage gaps. N workflows can't cover everything; the default fallback (typically QA) absorbs the residual. Mitigation: monitor router output distribution; add workflows when out-of-distribution queries cluster.
- Workflow ambiguity. Queries that span workflows ("I want to cancel and get a refund") need explicit disambiguation policy.
- Router accuracy regression. Re-prompting the router (or upgrading the underlying model) can shift the workflow distribution. Track per-workflow rate as an SLI.
Related¶
- concepts/llm-workflow-router — the underlying concept.
- patterns/intent-domain-decomposer-agentic-router — decomposer-router sibling (Databricks World Bank Group). Decomposes a query into multiple sub-queries routed to different domain agents; Yelp routes each query to exactly one handler.
- patterns/multi-agent-supervisor-routing — supervisor routing in multi-turn agent flows (different time horizon).
- concepts/retrieval-augmented-generation — the QA handler shape.
- concepts/llm-hallucination — the failure mode that motivates removing LLM from high-risk workflows.
- systems/yelp-cs-chatbot — canonical wiki instance.
Seen in¶
- sources/2026-05-27-yelp-beyond-the-menu-tree-how-yelp-built-a-smarter-customer-success-chatbot — canonical: 5 workflows, two-axis bucketing, doubled resolution rate vs legacy menu-tree chatbot.