Skip to content

PATTERN Cited by 2 sources

Warm standby deployment

Shape

DR deployment tier where the secondary environment runs the full stack at reduced scale. Unlike pilot-light, the compute tier is not stopped; unlike active-active, the secondary isn't taking production traffic. Minute-scale RTO; higher steady-state cost.

On failover: scale up the secondary, shift traffic. The stack is already live, so the failover step is capacity and traffic routing rather than provisioning from zero.

Cross-partition framing

"Finally, warm standby or multi-site active-active setups mainly differ in the need for more complex network synchronization across partitions." (Source: sources/2026-01-30-aws-sovereign-failover-design-digital-sovereignty)

The cross-partition version inherits the general warm-standby shape plus the four partition-aware surfaces:

  • Continuous data synchronization to the second partition, not just at failover time. Custom tooling because S3 Cross-Region Replication and Transit Gateway inter-region peering don't cross partitions.
  • Always-running PKI cross-signed CAs present before any request arrives.
  • Always-configured cross-partition IAM topology, so services in each partition can talk to their primary data tier immediately.
  • Pre-built traffic-shifting — no Route 53 cross-partition health checks, so traffic shifting has to be driven by an external control plane (DNS updates, CDN cutover, or client-side failover logic).

When warm standby is worth the cost over pilot light

  • RTO requirement is minutes, not hours. Pilot-light build-up from IaC is typically 15–60 minutes; warm standby is seconds to minutes.
  • The compute tier itself has a long warm-up time — JIT warmup, cache population, connection pools. For these, running the stack and just scaling it beats provisioning it from zero.
  • Regulatory audit requires continuous evidence that the failover environment is operating and controlled — e.g. certifications that a stopped stack can't produce.

When it's worse than active-active

When RTO must be effectively zero, pay for active-active instead — the delta between warm standby and active-active is the network synchronization complexity, which is already the hard part. At that point the additional cost of serving real traffic from the second partition is marginal relative to the cost of keeping two full-footprint environments in lockstep.

Seen in

Last updated · 200 distilled / 1,178 read