PATTERN Cited by 1 source

Hot-routing autoscale¶

Definition¶

Hot-routing autoscale is the pattern of dynamically routing new incoming work units (streams, connections, requests) away from overloaded backend nodes using heuristic-based load signals, combined with graceful draining of existing work on nodes being removed. It enables true elastic scaling (both up AND down) because the routing unit (stream/connection) is independent of any fixed infrastructure topology (partitions, shards).

Mechanism¶

A pool of backend pods serves incoming streams.
A routing layer monitors pod load (CPU, memory, connection count, throughput).
Scale-up: when a pod is "running hot," new streams are routed to a different pod. Existing streams on the hot pod are unaffected (ordering preserved).
Scale-down: when demand drops, pods are marked for removal. New streams stop routing there; existing streams drain gracefully to completion. The pool shrinks.
The producer is unaware of routing decisions — its ordering guarantee is at the stream level, not the pod level.

Why it's distinct from traditional load balancing¶

Traditional LB	Hot-routing autoscale
Routes individual requests	Routes long-lived streams/connections
Stateless per-request	Stateful — stream stays on assigned pod for its lifetime
Scale-down kills connections	Scale-down drains gracefully
Ordering depends on partition assignment	Ordering independent of pod assignment

Seen in¶

systems/zerobus-ingest — heuristic-based routing of gRPC streams across pod pool; enables 12 GB/s sustained with elastic compute utilization (Source: sources/2026-06-11-databricks-ingesting-the-milky-way-petabyte-scale-with-zerobus-ingest)

patterns/stream-connection-as-ordering-unit — the ordering model that makes hot-routing safe
concepts/fan-in-ingestion — the workload pattern that requires elastic routing
concepts/backpressure — the signal that triggers route-away decisions

Hot-routing autoscale¶

Definition¶

Mechanism¶

Why it's distinct from traditional load balancing¶

Seen in¶

Related¶