PATTERN Cited by 1 source

Dynamic routing — use LLMs selectively¶

Pattern¶

Not every input needs an LLM. Dynamic routing is the architectural pattern of using a streaming filter / routing layer to:

Scan the full data stream with cheaper mechanisms — heuristics, rules, or traditional ML models.
Route only anomalies / interesting cases to an LLM-backed agent for deeper investigation.
Route normal traffic to the existing cheap path without touching an LLM.

The pattern acknowledges that LLMs are expensive, compute-heavy, and slow relative to hand-tuned ML / heuristics; reserving them for the small fraction of inputs where they actually add value keeps the system economically and latency-ally viable at scale.

Canonical statement¶

From Tyler Akidau's 2026-02-10 Redpanda post (Source: sources/2026-02-10-redpanda-how-to-safely-deploy-agentic-ai-in-the-enterprise):

"LLMs have their uses, but they're not a fit for every problem — because they're also expensive, require a lot of compute, and aren't very fast. This is where dynamic routing becomes beneficial. You can use AI when it makes sense and continue to rely on your other systems when it doesn't."

"Take fraud detection, for example. Machine learning (ML) models and heuristics are cheaper and make more sense to scan most of your data (since fraud will likely only make up a small percentage). Once those systems identify an anomaly, a trained fraud detection agent can help you investigate further."

"If you're building on a streaming architecture, you get this ability to filter or route subsets of data to an agent."

Worked example: fraud detection¶

Transactions  ──►  [ML model +    ──► majority (normal) ──► existing fraud pipeline
                    heuristics]       (~99%)

                                  ──► minority (anomalous) ──► [fraud agent]
                                      (~1%)                    (LLM + tools
                                                               for investigation)

Majority path: ML fraud model scans all transactions. The ~99% of transactions that don't trigger anomalies flow through the normal processing path. No LLM involved.
Minority path: Transactions flagged as anomalous are routed to an LLM-backed fraud-detection agent that can pull related customer data, merchant history, transaction graphs, and generate a case-level investigation.

The result: LLM cost + latency is bounded by the anomaly rate, not the transaction rate. A 1% anomaly rate means 100× fewer LLM invocations than the naïve "every transaction through an agent" shape.

Why streaming is the natural substrate¶

Three properties of a streaming broker make dynamic routing natural:

Filter + route primitives. Kafka / Redpanda topics + consumer-groups + stream-processing DSLs (ksqlDB, Flink, Redpanda Connect, etc.) provide filter / route / fork operations as first-class primitives. A routing predicate becomes a single consumer with a filter.
Producer/consumer decoupling. The ML classifier and the agent-investigation consumer don't have to know about each other; both subscribe to topics and emit events. Adding a new agent path is a new subscriber, not a write-path rewrite.
Back-pressure and elasticity. LLM calls are slow and expensive; routing through a topic means the LLM consumer can scale independently, back-pressure via consumer lag, and be temporarily suspended without losing inputs.

Without streaming, the naïve pattern is synchronous: every event goes through the router, which calls the classifier, which decides whether to call the LLM inline — coupling latency + cost + availability of all three together.

Trade-offs¶

Wins - Bounded LLM cost — LLM invocations scale with anomaly rate, not request rate. - Latency budget preserved for majority path — normal transactions don't pay the LLM latency tax. - Independent scaling — classifier fleet and LLM-agent fleet scale on different signals. - Model-swap decoupling — upgrading from GPT-4 to GPT-5 (or switching LLM providers) affects only the minority-path consumer, not the hot path.

Costs - Two-tier model maintenance. You now run and maintain both an ML classifier and an LLM-agent pipeline. The ML model's accuracy sets the LLM's input distribution; drift in the classifier silently changes agent workload. - False-negative risk. If the ML classifier misses an anomaly, the LLM never sees it. The agent can't investigate what the classifier didn't flag. The LLM's catch-rate is conditional on classifier recall. - Decision-boundary coupling. Tuning the classifier's threshold (precision vs recall) becomes a joint decision with agent cost — tighter threshold = fewer agent calls but more missed cases; looser threshold = higher agent cost. - No whole-population learning. The agent only sees flagged cases, so it can't learn what "normal" looks like. Ground- truth for re-training must come from the ML pipeline, not the agent.

Cross-domain applicability¶

Fraud detection is the worked example, but the pattern generalises:

Customer support — bots + FAQ + search for common cases; LLM-agent for complex / novel cases.
Content moderation — classifier for clear-cut content; LLM for ambiguous / context-dependent judgement calls.
Anomaly investigation (logs, metrics, traces) — rule-based alerting for known patterns; LLM agent for root-cause analysis on novel patterns.
Document triage — keyword / classifier routing for common shapes; LLM for ambiguous / out-of-distribution documents.
Code review automation — linters + static analysis for deterministic checks; LLM-reviewer for architectural judgement.

In every case the structure is the same: cheap first-pass, LLM for the long tail of hard cases.

Caveats¶

Classifier as gatekeeper. The classifier becomes a single-point decision about what the LLM sees. Adversarial inputs that evade the classifier never get LLM scrutiny.
Latency asymmetry. Minority-path latency (seconds-to- minutes) is much higher than majority-path (milliseconds). Use cases that require uniform latency across all inputs aren't a fit.
Cost accounting is probabilistic. LLM cost scales with anomaly rate; if the business workload's anomaly rate shifts (e.g. a fraud surge), LLM cost can blow up unexpectedly. Cost envelopes should plan for rate-excursion scenarios.
Determinism gap. The ML path is deterministic; the LLM path isn't. Whether this matters depends on use case (regulated decisions may require deterministic audit trails on all paths, including the LLM route — see patterns/durable-event-log-as-agent-audit-envelope).
Classifier training data freshness. Dynamic routing depends on the classifier staying current; the same CDC / fan-out that feeds the agent also has to feed the classifier's training loop.

concepts/agentic-ai-infrastructure-challenges — axis 7 (routing) of Akidau's eight-axis checklist.
concepts/streaming-as-agile-data-platform-backbone — the structural framing that makes dynamic routing cheap to implement.
patterns/cdc-fanout-single-stream-to-many-consumers — peer pattern at fan-out altitude; dynamic routing is fan-out plus a condition-gated subscriber.

Seen in¶

sources/2026-02-10-redpanda-how-to-safely-deploy-agentic-ai-in-the-enterprise — canonical wiki introduction. Akidau names dynamic routing as axis 7 of his eight-axis enterprise-agent-infrastructure checklist, with fraud detection as the worked example.