PATTERN Cited by 1 source

Multi-signal workload-aware gateway routing¶

Multi-signal workload-aware gateway routing is the pattern of routing requests / queries / workloads at a gateway layer using multiple real-time signals combined into a single placement decision, with continuous re-evaluation as conditions shift — specifically combining query-intrinsic signals (what the work is), capacity signals (where there's headroom), and user-intent signals (what the user wants).

Canonical production instance (Source: sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance):

The Databricks Serverless Gateway routes each Spark workload using three signals:

"The Databricks gateway routes each workload by evaluating three real-time signals: estimated query size (derived from the logical plan), current utilization across the cluster pool, and latency profile: whether a session is interactive and latency-sensitive or a batch job optimized for throughput."

The three signal classes¶

Class	Source	Example (Databricks)
Query-intrinsic	From the request itself (parsed)	Logical-plan-derived query size
Capacity	Live backend metrics	Current cluster utilisation
User-intent	Session metadata / config	Interactive vs batch latency profile

Single-signal routers — e.g. pure round-robin, pure least-loaded, pure sticky-hash — work well within their signal's domain but fail when another signal matters. Multi-signal routing combines all three at decision time.

Worked example¶

A naive least-loaded router would send a small exploratory query to whichever cluster has the most CPU headroom, but that cluster might already be processing a heavy ETL stage that will degrade the small query's interactivity. A multi-signal router knows:

Query intrinsic: this is a small query
Capacity: cluster A has headroom, cluster B has more
User intent: the session is interactive (low latency required)

And routes to the cluster that's "lightly loaded" and serving interactive workloads, not just the cluster with the most raw headroom.

Conversely, a large ETL job gets "directed to a cluster with available headroom for its data volume, or the autoscaler is signaled to provision one" — the capacity signal drives the placement, with the query-intrinsic signal informing the how-big requirement.

Continuous re-evaluation¶

The pattern's distinguishing runtime property: placement is not one-shot. Canonical quote:

"When conditions shift (a cluster fills up, a long-running job finishes, a new cluster comes online), the gateway continuously re-evaluates placements and corrects routing without user intervention."

Contrast with traditional per-request placement that commits a query to a cluster at dispatch time and absorbs whatever happens after. Continuous re-evaluation adds a feedback loop:

Place query on cluster A
Cluster A saturates before query completes
Gateway moves query (or future queries in the session) to cluster B
New cluster C comes online from autoscaler → re-evaluate

The Databricks post doesn't disclose the specific re-evaluation mechanism (mid-query migration? per-session re-routing? queued- query re-prioritisation?), but the stated property is that routing decisions are live, not frozen.

Load-bearing outcome: cross-workload isolation¶

"Workloads are insulated from each other. A runaway query on one cluster doesn't delay queries on another, and the system maintains high utilization without sacrificing predictability."

This resolves the utilization vs predictability tradeoff at the pool level: the pool is densely utilised (all clusters are busy), but per-workload predictability is maintained (no single workload starves another).

Preconditions for the pattern¶

Structured queries at the gateway. The gateway needs to read the query to compute its intrinsic signal. If queries are opaque (raw bytes, encrypted payloads), only capacity-and-intent signals are available, losing much of the pattern's value. Databricks gets this from Spark Connect's gRPC split.
Many shared backend clusters, not one-per-workload. The pattern only works if there's a pool to route across. Dedicated- cluster-per-workload architectures have no routing surface.
Autoscaler signalling capacity back to the router. When no backend has headroom, the gateway needs a path to signal "provision more" — the Databricks Gateway signals the Serverless Autoscaler.

Sibling patterns at other altitudes¶

patterns/bounded-load-consistent-hashing — workload-aware at the per-partition altitude, using hash position + load cap
patterns/sticky-routing-for-aggregator-state — state-aware routing (Dicer/Telegraf) — single-signal (sticky-key) with feedback from rebalance
Kubernetes scheduler — multi-signal (node CPU, memory, taints, affinities, pod priorities) but one-shot at pod creation
Service mesh subsetting — capacity-aware but typically not query-intrinsic

Among siblings, Databricks' Gateway is distinctive for combining query-intrinsic + capacity + user-intent signals with continuous re-evaluation.

Anti-patterns¶

Single-signal routers applied to mixed workloads — least-loaded routing of a mix of small-interactive and big-batch queries produces bad latency on both ends
One-shot routing — placement frozen at dispatch time; can't recover from mid-job cluster saturation
User-supplied routing hints — pushes the decision back to the user, contradicting stability-as-system- property

Seen in¶

sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance — First canonical wiki instance of multi-signal workload- aware gateway routing as a named pattern. The Databricks Serverless Gateway combines query-intrinsic (logical-plan-derived size) + capacity (cluster utilisation) + user-intent (interactive vs batch latency profile), with continuous re-evaluation. Architecturally enabled by Spark Connect's gRPC split exposing structured queries to the gateway. Resolves the utilization vs predictability tension at the pool level.

systems/databricks-serverless-gateway — canonical production instance
concepts/query-size-from-logical-plan — the query-intrinsic signal primitive
concepts/utilization-vs-predictability-tradeoff — the tension the pattern resolves
patterns/grpc-decoupled-driver-client — the substrate that enables structured query inspection at the gateway
patterns/bounded-load-consistent-hashing — a sibling at the partition-routing altitude
patterns/sticky-routing-for-aggregator-state — a sibling with stateful-backend constraints