Skip to content

CONCEPT Cited by 2 sources

Workload-aware routing

Definition

Workload-aware routing is the architectural pattern of making load-balancer / gateway routing decisions based on the shape of the incoming request (query content, target tables, source application, payload characteristics), rather than treating all backends as interchangeable and round-robining among them.

It assumes the backend fleet is deliberately heterogeneous — different clusters tuned for different workload shapes — and the router's job is to match request shape to cluster shape.

Why it matters

A shape-agnostic LB (round-robin, least-connections, random) treats every backend as interchangeable. This breaks when the backend fleet is intentionally not uniform:

  • A cluster tuned for few, large, long-running queries has high per-query memory, low concurrency ceiling, tuned GC settings.
  • A cluster tuned for many, small, fast queries has low per-query memory, high concurrency ceiling, aggressive result-caching.
  • A cluster tuned for metadata-only queries runs a single node, small memory, extremely fast response to select version() / show catalogs.

If BI dashboards (many small queries) and a nightly ETL job (few huge queries) land on the same cluster, BI suffers (tail latency blows up when ETL runs) and ETL is underprovisioned for its true memory needs. Shape-agnostic LB cannot fix this because the fix is to send queries to the right cluster, not to pick evenly among clusters.

Required inputs

A workload-aware router inspects the request payload at application-protocol level (L7 load balancing) and extracts features the routing rules can match on. Typical features in a SQL-query gateway context (from sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway):

  • Tables referenced. trinoQueryProperties.getTables() — route queries against specific large tables to heavy-workload clusters.
  • Query body text. trinoQueryProperties.getBody() — detect metadata queries like select version(), show catalogs, route to metadata cluster.
  • Source application (header-based). request.getHeader("X-Trino-Source") — route Tableau / Looker / Mode queries to BI clusters.
  • User / team identity (implicit, via auth).
  • Approximate query complexity / estimated cost.
  • Time of day, cluster utilization, SLO budget — secondary inputs sometimes used to bias among eligible clusters.

Production examples

Trino Gateway (Expedia)

The canonical SQL-engine-fleet workload-aware router. Three named routing-rule shapes:

  1. Large-table isolation — queries touching named large tables → heavy-workload cluster.
  2. Metadata offloadselect version() / show catalogs → lightweight metadata cluster (single-node) so dashboard extract-failure rates drop.
  3. BI-source routingX-Trino-Source contains "Tableau" / "Looker" → BI-optimised cluster.

Rules are hot-editable, UI-managed, and evaluated per query.

Contrast with other LB strategies

  • Round-robin / least-connections / random. Shape-agnostic. Correct when backends are identical and interchangeable.
  • Consistent hashing / affinity-based routing. Takes a shape input (the key) but aims for stickiness (same key → same backend), not match-quality. Useful for cache locality, not workload-fit.
  • PID-feedback LB (Dropbox Robinhood). Shapes per-endpoint weights based on observed utilization; still shape-agnostic about the request. Orthogonal to workload-aware routing — a system can do both.
  • Envoy / kube-proxy. Fully configurable to express workload-aware policies, but the policy itself has to be written by the operator.

Seen in

Last updated · 200 distilled / 1,178 read