CONCEPT Cited by 2 sources
Workload-aware routing¶
Definition¶
Workload-aware routing is the architectural pattern of making load-balancer / gateway routing decisions based on the shape of the incoming request (query content, target tables, source application, payload characteristics), rather than treating all backends as interchangeable and round-robining among them.
It assumes the backend fleet is deliberately heterogeneous — different clusters tuned for different workload shapes — and the router's job is to match request shape to cluster shape.
Why it matters¶
A shape-agnostic LB (round-robin, least-connections, random) treats every backend as interchangeable. This breaks when the backend fleet is intentionally not uniform:
- A cluster tuned for few, large, long-running queries has high per-query memory, low concurrency ceiling, tuned GC settings.
- A cluster tuned for many, small, fast queries has low per-query memory, high concurrency ceiling, aggressive result-caching.
- A cluster tuned for metadata-only queries runs a single
node, small memory, extremely fast response to
select version()/show catalogs.
If BI dashboards (many small queries) and a nightly ETL job (few huge queries) land on the same cluster, BI suffers (tail latency blows up when ETL runs) and ETL is underprovisioned for its true memory needs. Shape-agnostic LB cannot fix this because the fix is to send queries to the right cluster, not to pick evenly among clusters.
Required inputs¶
A workload-aware router inspects the request payload at application-protocol level (L7 load balancing) and extracts features the routing rules can match on. Typical features in a SQL-query gateway context (from sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway):
- Tables referenced.
trinoQueryProperties.getTables()— route queries against specific large tables to heavy-workload clusters. - Query body text.
trinoQueryProperties.getBody()— detect metadata queries likeselect version(),show catalogs, route to metadata cluster. - Source application (header-based).
request.getHeader("X-Trino-Source")— route Tableau / Looker / Mode queries to BI clusters. - User / team identity (implicit, via auth).
- Approximate query complexity / estimated cost.
- Time of day, cluster utilization, SLO budget — secondary inputs sometimes used to bias among eligible clusters.
Production examples¶
Trino Gateway (Expedia)¶
The canonical SQL-engine-fleet workload-aware router. Three named routing-rule shapes:
- Large-table isolation — queries touching named large tables → heavy-workload cluster.
- Metadata offload —
select version()/show catalogs→ lightweight metadata cluster (single-node) so dashboard extract-failure rates drop. - BI-source routing —
X-Trino-Sourcecontains "Tableau" / "Looker" → BI-optimised cluster.
Rules are hot-editable, UI-managed, and evaluated per query.
Contrast with other LB strategies¶
- Round-robin / least-connections / random. Shape-agnostic. Correct when backends are identical and interchangeable.
- Consistent hashing / affinity-based routing. Takes a shape input (the key) but aims for stickiness (same key → same backend), not match-quality. Useful for cache locality, not workload-fit.
- PID-feedback LB (Dropbox Robinhood). Shapes per-endpoint weights based on observed utilization; still shape-agnostic about the request. Orthogonal to workload-aware routing — a system can do both.
- Envoy / kube-proxy. Fully configurable to express workload-aware policies, but the policy itself has to be written by the operator.
Seen in¶
- sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway — canonical SQL-query-fleet instance via Trino Gateway; Adhoc / ETL / BI cluster segregation; routing by tables, body text, and source header.
- sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod — LLM-inference specialisation: AWS SageMaker HyperPod Inference Operator ships three configurable routing strategies — prefix-aware (route requests sharing a common prompt prefix to the same replica so the prefix's KV cache hits on subsequent requests), KV-aware (use live cache-occupancy telemetry per replica to pick the replica with the hottest matching cache), round-robin (cache-agnostic baseline). Workload shape being routed on: the prompt-prefix cache-occupancy signal on the backends, specific to transformer-decoder inference. See concepts/prefix-aware-routing for the LLM-serving specialisation.
Related¶
- concepts/layer-7-load-balancing — the mechanical prerequisite (payload-level inspection).
- concepts/single-endpoint-abstraction — the companion client-facing property (one URL for the fleet).
- patterns/query-gateway — the general SQL-engine-fleet realisation.
- patterns/workload-segregated-clusters — the backend-shape presupposition workload-aware routing requires.
- patterns/routing-rules-as-config — how workload-aware routing is usually expressed.
- systems/trino-gateway — canonical instance.