PATTERN Cited by 1 source
Multi-signal workload-aware gateway routing¶
Multi-signal workload-aware gateway routing is the pattern of routing requests / queries / workloads at a gateway layer using multiple real-time signals combined into a single placement decision, with continuous re-evaluation as conditions shift — specifically combining query-intrinsic signals (what the work is), capacity signals (where there's headroom), and user-intent signals (what the user wants).
Canonical production instance (Source: sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance):
The Databricks Serverless Gateway routes each Spark workload using three signals:
"The Databricks gateway routes each workload by evaluating three real-time signals: estimated query size (derived from the logical plan), current utilization across the cluster pool, and latency profile: whether a session is interactive and latency-sensitive or a batch job optimized for throughput."
The three signal classes¶
| Class | Source | Example (Databricks) |
|---|---|---|
| Query-intrinsic | From the request itself (parsed) | Logical-plan-derived query size |
| Capacity | Live backend metrics | Current cluster utilisation |
| User-intent | Session metadata / config | Interactive vs batch latency profile |
Single-signal routers — e.g. pure round-robin, pure least-loaded, pure sticky-hash — work well within their signal's domain but fail when another signal matters. Multi-signal routing combines all three at decision time.
Worked example¶
A naive least-loaded router would send a small exploratory query to whichever cluster has the most CPU headroom, but that cluster might already be processing a heavy ETL stage that will degrade the small query's interactivity. A multi-signal router knows:
- Query intrinsic: this is a small query
- Capacity: cluster A has headroom, cluster B has more
- User intent: the session is interactive (low latency required)
And routes to the cluster that's "lightly loaded" and serving interactive workloads, not just the cluster with the most raw headroom.
Conversely, a large ETL job gets "directed to a cluster with available headroom for its data volume, or the autoscaler is signaled to provision one" — the capacity signal drives the placement, with the query-intrinsic signal informing the how-big requirement.
Continuous re-evaluation¶
The pattern's distinguishing runtime property: placement is not one-shot. Canonical quote:
"When conditions shift (a cluster fills up, a long-running job finishes, a new cluster comes online), the gateway continuously re-evaluates placements and corrects routing without user intervention."
Contrast with traditional per-request placement that commits a query to a cluster at dispatch time and absorbs whatever happens after. Continuous re-evaluation adds a feedback loop:
- Place query on cluster A
- Cluster A saturates before query completes
- Gateway moves query (or future queries in the session) to cluster B
- New cluster C comes online from autoscaler → re-evaluate
The Databricks post doesn't disclose the specific re-evaluation mechanism (mid-query migration? per-session re-routing? queued- query re-prioritisation?), but the stated property is that routing decisions are live, not frozen.
Load-bearing outcome: cross-workload isolation¶
"Workloads are insulated from each other. A runaway query on one cluster doesn't delay queries on another, and the system maintains high utilization without sacrificing predictability."
This resolves the utilization vs predictability tradeoff at the pool level: the pool is densely utilised (all clusters are busy), but per-workload predictability is maintained (no single workload starves another).
Preconditions for the pattern¶
-
Structured queries at the gateway. The gateway needs to read the query to compute its intrinsic signal. If queries are opaque (raw bytes, encrypted payloads), only capacity-and-intent signals are available, losing much of the pattern's value. Databricks gets this from Spark Connect's gRPC split.
-
Many shared backend clusters, not one-per-workload. The pattern only works if there's a pool to route across. Dedicated- cluster-per-workload architectures have no routing surface.
-
Autoscaler signalling capacity back to the router. When no backend has headroom, the gateway needs a path to signal "provision more" — the Databricks Gateway signals the Serverless Autoscaler.
Sibling patterns at other altitudes¶
- patterns/bounded-load-consistent-hashing — workload-aware at the per-partition altitude, using hash position + load cap
- patterns/sticky-routing-for-aggregator-state — state-aware routing (Dicer/Telegraf) — single-signal (sticky-key) with feedback from rebalance
- Kubernetes scheduler — multi-signal (node CPU, memory, taints, affinities, pod priorities) but one-shot at pod creation
- Service mesh subsetting — capacity-aware but typically not query-intrinsic
Among siblings, Databricks' Gateway is distinctive for combining query-intrinsic + capacity + user-intent signals with continuous re-evaluation.
Anti-patterns¶
- Single-signal routers applied to mixed workloads — least-loaded routing of a mix of small-interactive and big-batch queries produces bad latency on both ends
- One-shot routing — placement frozen at dispatch time; can't recover from mid-job cluster saturation
- User-supplied routing hints — pushes the decision back to the user, contradicting stability-as-system- property
Seen in¶
- sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance — First canonical wiki instance of multi-signal workload- aware gateway routing as a named pattern. The Databricks Serverless Gateway combines query-intrinsic (logical-plan-derived size) + capacity (cluster utilisation) + user-intent (interactive vs batch latency profile), with continuous re-evaluation. Architecturally enabled by Spark Connect's gRPC split exposing structured queries to the gateway. Resolves the utilization vs predictability tension at the pool level.
Related¶
- systems/databricks-serverless-gateway — canonical production instance
- concepts/query-size-from-logical-plan — the query-intrinsic signal primitive
- concepts/utilization-vs-predictability-tradeoff — the tension the pattern resolves
- patterns/grpc-decoupled-driver-client — the substrate that enables structured query inspection at the gateway
- patterns/bounded-load-consistent-hashing — a sibling at the partition-routing altitude
- patterns/sticky-routing-for-aggregator-state — a sibling with stateful-backend constraints