PATTERN Cited by 1 source
Hot-routing autoscale¶
Definition¶
Hot-routing autoscale is the pattern of dynamically routing new incoming work units (streams, connections, requests) away from overloaded backend nodes using heuristic-based load signals, combined with graceful draining of existing work on nodes being removed. It enables true elastic scaling (both up AND down) because the routing unit (stream/connection) is independent of any fixed infrastructure topology (partitions, shards).
Mechanism¶
- A pool of backend pods serves incoming streams.
- A routing layer monitors pod load (CPU, memory, connection count, throughput).
- Scale-up: when a pod is "running hot," new streams are routed to a different pod. Existing streams on the hot pod are unaffected (ordering preserved).
- Scale-down: when demand drops, pods are marked for removal. New streams stop routing there; existing streams drain gracefully to completion. The pool shrinks.
- The producer is unaware of routing decisions — its ordering guarantee is at the stream level, not the pod level.
Why it's distinct from traditional load balancing¶
| Traditional LB | Hot-routing autoscale |
|---|---|
| Routes individual requests | Routes long-lived streams/connections |
| Stateless per-request | Stateful — stream stays on assigned pod for its lifetime |
| Scale-down kills connections | Scale-down drains gracefully |
| Ordering depends on partition assignment | Ordering independent of pod assignment |
Seen in¶
- systems/zerobus-ingest — heuristic-based routing of gRPC streams across pod pool; enables 12 GB/s sustained with elastic compute utilization (Source: sources/2026-06-11-databricks-ingesting-the-milky-way-petabyte-scale-with-zerobus-ingest)
Related¶
- patterns/stream-connection-as-ordering-unit — the ordering model that makes hot-routing safe
- concepts/fan-in-ingestion — the workload pattern that requires elastic routing
- concepts/backpressure — the signal that triggers route-away decisions