Skip to content

PATTERN Cited by 1 source

Multi-signal workload-aware gateway routing

Multi-signal workload-aware gateway routing is the pattern of routing requests / queries / workloads at a gateway layer using multiple real-time signals combined into a single placement decision, with continuous re-evaluation as conditions shift — specifically combining query-intrinsic signals (what the work is), capacity signals (where there's headroom), and user-intent signals (what the user wants).

Canonical production instance (Source: sources/2026-05-06-databricks-rethinking-distributed-systems-for-serverless-performance):

The Databricks Serverless Gateway routes each Spark workload using three signals:

"The Databricks gateway routes each workload by evaluating three real-time signals: estimated query size (derived from the logical plan), current utilization across the cluster pool, and latency profile: whether a session is interactive and latency-sensitive or a batch job optimized for throughput."

The three signal classes

Class Source Example (Databricks)
Query-intrinsic From the request itself (parsed) Logical-plan-derived query size
Capacity Live backend metrics Current cluster utilisation
User-intent Session metadata / config Interactive vs batch latency profile

Single-signal routers — e.g. pure round-robin, pure least-loaded, pure sticky-hash — work well within their signal's domain but fail when another signal matters. Multi-signal routing combines all three at decision time.

Worked example

A naive least-loaded router would send a small exploratory query to whichever cluster has the most CPU headroom, but that cluster might already be processing a heavy ETL stage that will degrade the small query's interactivity. A multi-signal router knows:

  • Query intrinsic: this is a small query
  • Capacity: cluster A has headroom, cluster B has more
  • User intent: the session is interactive (low latency required)

And routes to the cluster that's "lightly loaded" and serving interactive workloads, not just the cluster with the most raw headroom.

Conversely, a large ETL job gets "directed to a cluster with available headroom for its data volume, or the autoscaler is signaled to provision one" — the capacity signal drives the placement, with the query-intrinsic signal informing the how-big requirement.

Continuous re-evaluation

The pattern's distinguishing runtime property: placement is not one-shot. Canonical quote:

"When conditions shift (a cluster fills up, a long-running job finishes, a new cluster comes online), the gateway continuously re-evaluates placements and corrects routing without user intervention."

Contrast with traditional per-request placement that commits a query to a cluster at dispatch time and absorbs whatever happens after. Continuous re-evaluation adds a feedback loop:

  1. Place query on cluster A
  2. Cluster A saturates before query completes
  3. Gateway moves query (or future queries in the session) to cluster B
  4. New cluster C comes online from autoscaler → re-evaluate

The Databricks post doesn't disclose the specific re-evaluation mechanism (mid-query migration? per-session re-routing? queued- query re-prioritisation?), but the stated property is that routing decisions are live, not frozen.

Load-bearing outcome: cross-workload isolation

"Workloads are insulated from each other. A runaway query on one cluster doesn't delay queries on another, and the system maintains high utilization without sacrificing predictability."

This resolves the utilization vs predictability tradeoff at the pool level: the pool is densely utilised (all clusters are busy), but per-workload predictability is maintained (no single workload starves another).

Preconditions for the pattern

  1. Structured queries at the gateway. The gateway needs to read the query to compute its intrinsic signal. If queries are opaque (raw bytes, encrypted payloads), only capacity-and-intent signals are available, losing much of the pattern's value. Databricks gets this from Spark Connect's gRPC split.

  2. Many shared backend clusters, not one-per-workload. The pattern only works if there's a pool to route across. Dedicated- cluster-per-workload architectures have no routing surface.

  3. Autoscaler signalling capacity back to the router. When no backend has headroom, the gateway needs a path to signal "provision more" — the Databricks Gateway signals the Serverless Autoscaler.

Sibling patterns at other altitudes

  • patterns/bounded-load-consistent-hashing — workload-aware at the per-partition altitude, using hash position + load cap
  • patterns/sticky-routing-for-aggregator-state — state-aware routing (Dicer/Telegraf) — single-signal (sticky-key) with feedback from rebalance
  • Kubernetes scheduler — multi-signal (node CPU, memory, taints, affinities, pod priorities) but one-shot at pod creation
  • Service mesh subsetting — capacity-aware but typically not query-intrinsic

Among siblings, Databricks' Gateway is distinctive for combining query-intrinsic + capacity + user-intent signals with continuous re-evaluation.

Anti-patterns

  • Single-signal routers applied to mixed workloads — least-loaded routing of a mix of small-interactive and big-batch queries produces bad latency on both ends
  • One-shot routing — placement frozen at dispatch time; can't recover from mid-job cluster saturation
  • User-supplied routing hints — pushes the decision back to the user, contradicting stability-as-system- property

Seen in

Last updated · 451 distilled / 1,324 read