Skip to content

PATTERN Cited by 1 source

Specialized workflow router with LLM intent detection

Pattern shape

An LLM classifies inbound user requests into one of N pre-defined workflows; each workflow has its own specialised handler independently designed to do the work. The handlers need not all use the LLM — they can be deterministic UIs, form-flow guides, templated canned responses, RAG-driven LLM generation, or any mix. Workflow buckets are picked along two axes: frequency of inbound requests AND risk class (legal, financial, churn, safety). High-risk request shapes get deterministic / templated handlers that bypass LLM generation entirely.

Structure

┌────────────────────────┐
│   User query           │
└──────────┬─────────────┘
┌────────────────────────┐
│   LLM intent router    │ ← classifies into 1 of N workflows
└──────────┬─────────────┘
   ┌───────┴───────┬───────────┬───────────┬───────────┐
   ▼               ▼           ▼           ▼           ▼
┌──────┐    ┌──────────┐  ┌──────────┐ ┌─────────┐ ┌─────────┐
│ QA   │    │ Billing  │  │ Refund   │ │ Cancel  │ │ Review  │
│ RAG  │    │ Det. UI  │  │ Form     │ │ Template│ │ Template│
│ LLM  │    │ no-LLM   │  │ guide    │ │ no-LLM  │ │ no-LLM  │
└──────┘    └──────────┘  └──────────┘ └─────────┘ └─────────┘
   │             │             │           │            │
   │             │             │           │            │
   │             │             ▼           ▼            ▼
   │             │       ┌────────────────────────────────┐
   │             ├──────▶│  Output to user                │
   │             │       │  (Billing / Form / Templated)  │
   │             │       └────────────────────────────────┘
   ▼             ▼
┌─────────────────────────┐
│  Validation gate         │ ← LLM-generated content only
│  (T&S / URL / char-limit)│
└──────────┬──────────────┘
┌─────────────────────────┐
│  Output to user         │
└─────────────────────────┘

Canonical instance — Yelp CS Chatbot (2026-05-27)

Yelp's LLM-Assisted Customer Success Chatbot routes inbound queries into five named workflows:

Workflow Handler shape Risk class
Question/Answering (QA) RAG-driven LLM generation Standard
Billing Deterministic UI (subscribed services + balances) Low
Refund Form-submission guide Standard
Cancel Templated response (no LLM generation) High financial / legal
Review Templated response (no LLM generation) High financial / legal

Only ONE of the five workflows (QA) has the LLM actually generate free-form text. Verbatim from the post:

"We bucketed the workflows based on the frequency of inbound requests along with the potential risks of the queries (e.g. churn risk, legal risk, and financial risk). When a user submits a query, the system uses the LLM to intelligently detect which workflow the query should follow." (Source: sources/2026-05-27-yelp-beyond-the-menu-tree-how-yelp-built-a-smarter-customer-success-chatbot)

A/B-test outcome vs the legacy menu-tree+fixed-phrase chatbot: doubled resolution rate.

Three structural pieces

  1. LLM intent classifier — short prompt asking the LLM to pick one of N workflows for a given user query. Prompt includes workflow names, brief descriptions, and ideally a default fallback.
  2. Workflow-specific handlers — one per workflow. Independently designed; some are deterministic, some template-driven, some RAG-driven. Handlers don't share prompts, error handling, or output format constraints.
  3. Workflow-specific output gate — applies only when the handler itself uses LLM generation. Templated / deterministic handlers skip the gate. Yelp's QA gate is three-axis: trust & safety / valid URL / character limit.

When to apply

Use this pattern when:

  • Inbound queries span distinct request shapes with different cost / risk / SLA properties.
  • Some request shapes have legal / financial / churn risk that makes LLM generation unsafe; bypassing LLM generation for those is desirable.
  • Some request shapes are deterministic (Billing details, account balances) — LLM-generated answers there add risk without value.
  • The LLM's classification capability is more reliable than its generation capability for the domain.

Don't use this pattern when:

  • Queries are uniformly free-form and can't be cleanly bucketed.
  • Handler diversity adds operational overhead that exceeds the per-workflow specialisation benefit.
  • Workflow-mis-routing cost is higher than monolithic-LLM generation cost (rare, but possible in adversarial domains).

Trade-offs

  • Operational complexity ↑ — N handlers to maintain. Mitigated by handler independence (changes to one don't regress others).
  • Latency ↑ — every query pays an LLM-classification cost before the handler runs. Mitigated by using a small fast model (e.g. GPT-4o-mini) for the router and reserving larger models for QA generation.
  • Risk surface area ↓↓ — high-risk workflows entirely bypass LLM generation. The blast radius of an LLM hallucination is bounded to the QA workflow only.
  • Engineering velocity per handler ↑ — handler teams own their slice; can iterate independently.
  • User experience — uniform front (chat interface) over heterogeneous back (handlers). User doesn't know the request was routed.

Risks

  • Mis-routing. Router classifies Cancel intent as QA; QA-RAG generates an LLM response that should have been a templated risk-mitigation message. Mitigation: conservative router prompts; bias toward routing into safer/templated workflows when uncertain.
  • Coverage gaps. N workflows can't cover everything; the default fallback (typically QA) absorbs the residual. Mitigation: monitor router output distribution; add workflows when out-of-distribution queries cluster.
  • Workflow ambiguity. Queries that span workflows ("I want to cancel and get a refund") need explicit disambiguation policy.
  • Router accuracy regression. Re-prompting the router (or upgrading the underlying model) can shift the workflow distribution. Track per-workflow rate as an SLI.

Seen in

Last updated · 542 distilled / 1,571 read