CONCEPT Cited by 1 source
LLM workflow router¶
Definition¶
An LLM workflow router is an architectural primitive where a language model classifies an inbound user request into one of N pre-defined workflows, and a specialised handler for each workflow does the actual work. The LLM is the router; each handler is independently designed, can use deterministic logic / templates / RAG / form flows / further LLM generation, and need not all use the LLM at all.
This is structurally distinct from monolithic conversational LLM designs where a single LLM tries to handle everything in one prompt-response loop.
Canonical wiki disclosure: Yelp's 2026-05-27 LLM-Assisted CS Chatbot post.
"We realized that a single conversational model could struggle to handle the large volume and diverse nature of inbound customer requests. To manage this complexity efficiently and ensure proper guidance for specific actions (e.g. refunds), we designed the new LLM-Assisted CS Chatbot to route queries into five distinct, specialized workflows. We bucketed the workflows based on the frequency of inbound requests along with the potential risks of the queries (e.g. churn risk, legal risk, and financial risk). When a user submits a query, the system uses the LLM to intelligently detect which workflow the query should follow." (Source: sources/2026-05-27-yelp-beyond-the-menu-tree-how-yelp-built-a-smarter-customer-success-chatbot)
Two-axis bucketing¶
Yelp picks the workflow buckets along two axes:
- Frequency of inbound requests — common request shapes get their own workflow.
- Risk class — churn / legal / financial — high-risk request shapes get deterministic / templated handlers that bypass LLM generation entirely.
The five Yelp buckets:
| Workflow | Handler shape | Risk class |
|---|---|---|
| Question/Answering (QA) | RAG-driven LLM generation | Standard |
| Billing | Deterministic UI (subscribed services + balances) | Low |
| Refund | Form-submission guide | Standard |
| Cancel | Templated response (no LLM gen) | High financial / legal |
| Review | Templated response (no LLM gen) | High financial / legal |
Only ONE of the five workflows actually has the LLM generate free-form text — the QA workflow. Cancel and Review return canned text "due to high financial/legal risk". Billing returns deterministic UI. Refund guides through a form. The LLM-as-router primitive minimises LLM generative surface area to where it adds the most value while bounding it away from where mistakes are most expensive.
Why this beats monolithic conversational LLM¶
The Yelp post enumerates the structural reasons:
- High-volume / diverse inbound — a single conversational model "could struggle to handle the large volume and diverse nature of inbound customer requests."
- Specific-action guidance — "ensure proper guidance for specific actions (e.g. refunds)" — refund flows have precise required fields that a free-form chatbot would mishandle.
- Risk minimisation — "high financial/legal risk" workflows shouldn't have the LLM improvise.
- Operational simplicity — each handler can be developed, tested, and updated independently. Cancel-templated-response changes don't risk regressing QA-RAG retrieval quality.
Failure modes¶
- Mis-routing. The LLM router can classify a Cancel intent as QA, with QA generating an LLM response that should have been a templated risk-mitigation message. Mitigation: conservative router prompts; templated/canned content for edge classifications.
- Workflow ambiguity. Some queries genuinely span multiple workflows ("I want to cancel my subscription and get a refund for the last month"). Router needs disambiguation policy.
- Coverage gaps. Five workflows don't cover everything; the default fallback (QA-RAG) absorbs the residual but may underperform on out-of-distribution queries.
- Router-itself accuracy not disclosed. Yelp does not publish per-workflow router classification accuracy or per-workflow false-positive cost.
Comparison with related routing primitives¶
- vs patterns/intent-domain-decomposer-agentic-router (Databricks World Bank Group, 2026-05-22) — World Bank decomposes a user query into multiple sub-queries routed to different domain agents in parallel; results are composed. Yelp's CS Chatbot routes each query to exactly one workflow handler. World Bank is decomposer; Yelp is classifier. Both share the LLM-as-router primitive.
- vs patterns/multi-agent-supervisor-routing — supervisor agents in long-running multi-turn flows route "who acts next" across an agent network. Yelp routes "which handler answers" per single inbound query. Different time horizons.
- vs deterministic intent classification (e.g. legacy menu-tree + fixed-phrase matching, the system Yelp replaced) — deterministic classifiers handle a fixed vocabulary; LLM workflow router handles paraphrase diversity by leveraging the LLM's semantic understanding.
Caveats¶
- Single-source canonical on the wiki. Yelp's 2026-05-27 post is the wiki's first explicit canonicalisation; the pattern is widely observed in production chatbots but rarely architecturally disclosed in this clean form.
- Workflow detection latency cost. Every inbound query pays an LLM-classification cost before the handler runs; unclear whether Yelp uses a small fast model (router) + large slow model (QA generation) — a common production optimisation that the post does not disclose.
Seen in¶
- sources/2026-05-27-yelp-beyond-the-menu-tree-how-yelp-built-a-smarter-customer-success-chatbot — canonical: 5 workflows (QA/Billing/Refund/Cancel/Review), two-axis bucketing (frequency × risk), templated responses for high-risk, RAG for default QA.
Related¶
- concepts/retrieval-augmented-generation — the QA-handler shape.
- concepts/llm-hallucination — the failure mode that motivates removing LLM generation from high-risk workflows.
- patterns/specialized-workflow-router-with-llm-intent-detection — the canonical wiki pattern.
- patterns/intent-domain-decomposer-agentic-router — decomposer-router sibling.
- patterns/multi-agent-supervisor-routing — agent-routing sibling at multi-turn altitude.