Skip to content

PATTERN Cited by 1 source

Hot-routing autoscale

Definition

Hot-routing autoscale is the pattern of dynamically routing new incoming work units (streams, connections, requests) away from overloaded backend nodes using heuristic-based load signals, combined with graceful draining of existing work on nodes being removed. It enables true elastic scaling (both up AND down) because the routing unit (stream/connection) is independent of any fixed infrastructure topology (partitions, shards).

Mechanism

  1. A pool of backend pods serves incoming streams.
  2. A routing layer monitors pod load (CPU, memory, connection count, throughput).
  3. Scale-up: when a pod is "running hot," new streams are routed to a different pod. Existing streams on the hot pod are unaffected (ordering preserved).
  4. Scale-down: when demand drops, pods are marked for removal. New streams stop routing there; existing streams drain gracefully to completion. The pool shrinks.
  5. The producer is unaware of routing decisions — its ordering guarantee is at the stream level, not the pod level.

Why it's distinct from traditional load balancing

Traditional LB Hot-routing autoscale
Routes individual requests Routes long-lived streams/connections
Stateless per-request Stateful — stream stays on assigned pod for its lifetime
Scale-down kills connections Scale-down drains gracefully
Ordering depends on partition assignment Ordering independent of pod assignment

Seen in

Last updated · 542 distilled / 1,571 read