CONCEPT Cited by 2 sources

Workload-aware routing¶

Definition¶

Workload-aware routing is the architectural pattern of making load-balancer / gateway routing decisions based on the shape of the incoming request (query content, target tables, source application, payload characteristics), rather than treating all backends as interchangeable and round-robining among them.

It assumes the backend fleet is deliberately heterogeneous — different clusters tuned for different workload shapes — and the router's job is to match request shape to cluster shape.

Why it matters¶

A shape-agnostic LB (round-robin, least-connections, random) treats every backend as interchangeable. This breaks when the backend fleet is intentionally not uniform:

A cluster tuned for few, large, long-running queries has high per-query memory, low concurrency ceiling, tuned GC settings.
A cluster tuned for many, small, fast queries has low per-query memory, high concurrency ceiling, aggressive result-caching.
A cluster tuned for metadata-only queries runs a single node, small memory, extremely fast response to select version() / show catalogs.

If BI dashboards (many small queries) and a nightly ETL job (few huge queries) land on the same cluster, BI suffers (tail latency blows up when ETL runs) and ETL is underprovisioned for its true memory needs. Shape-agnostic LB cannot fix this because the fix is to send queries to the right cluster, not to pick evenly among clusters.

Required inputs¶

A workload-aware router inspects the request payload at application-protocol level (L7 load balancing) and extracts features the routing rules can match on. Typical features in a SQL-query gateway context (from sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway):

Tables referenced. trinoQueryProperties.getTables() — route queries against specific large tables to heavy-workload clusters.
Query body text. trinoQueryProperties.getBody() — detect metadata queries like select version(), show catalogs, route to metadata cluster.
Source application (header-based). request.getHeader("X-Trino-Source") — route Tableau / Looker / Mode queries to BI clusters.
User / team identity (implicit, via auth).
Approximate query complexity / estimated cost.
Time of day, cluster utilization, SLO budget — secondary inputs sometimes used to bias among eligible clusters.

Production examples¶

Trino Gateway (Expedia)¶

The canonical SQL-engine-fleet workload-aware router. Three named routing-rule shapes:

Large-table isolation — queries touching named large tables → heavy-workload cluster.
Metadata offload — select version() / show catalogs → lightweight metadata cluster (single-node) so dashboard extract-failure rates drop.
BI-source routing — X-Trino-Source contains "Tableau" / "Looker" → BI-optimised cluster.

Rules are hot-editable, UI-managed, and evaluated per query.

Contrast with other LB strategies¶

Round-robin / least-connections / random. Shape-agnostic. Correct when backends are identical and interchangeable.
Consistent hashing / affinity-based routing. Takes a shape input (the key) but aims for stickiness (same key → same backend), not match-quality. Useful for cache locality, not workload-fit.
PID-feedback LB (Dropbox Robinhood). Shapes per-endpoint weights based on observed utilization; still shape-agnostic about the request. Orthogonal to workload-aware routing — a system can do both.
Envoy / kube-proxy. Fully configurable to express workload-aware policies, but the policy itself has to be written by the operator.

Seen in¶

sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway — canonical SQL-query-fleet instance via Trino Gateway; Adhoc / ETL / BI cluster segregation; routing by tables, body text, and source header.
sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod — LLM-inference specialisation: AWS SageMaker HyperPod Inference Operator ships three configurable routing strategies — prefix-aware (route requests sharing a common prompt prefix to the same replica so the prefix's KV cache hits on subsequent requests), KV-aware (use live cache-occupancy telemetry per replica to pick the replica with the hottest matching cache), round-robin (cache-agnostic baseline). Workload shape being routed on: the prompt-prefix cache-occupancy signal on the backends, specific to transformer-decoder inference. See concepts/prefix-aware-routing for the LLM-serving specialisation.

concepts/layer-7-load-balancing — the mechanical prerequisite (payload-level inspection).
concepts/single-endpoint-abstraction — the companion client-facing property (one URL for the fleet).
patterns/query-gateway — the general SQL-engine-fleet realisation.
patterns/workload-segregated-clusters — the backend-shape presupposition workload-aware routing requires.
patterns/routing-rules-as-config — how workload-aware routing is usually expressed.
systems/trino-gateway — canonical instance.