Skip to content

PATTERN Cited by 2 sources

Pilot light deployment

Shape

DR deployment tier where the data tier in the secondary environment is running and replicated, but the compute tier is stopped (or minimally provisioned). "Built up when needed" on failover — IaC templates instantiate the compute + networking / DNS topology on-demand, services come up against the already-replicated data.

Named between backup/restore (below) and warm standby (above) on the DR ladder; cheaper than warm standby, faster to recover than backup/restore.

Why it's the cross-partition default

The AWS Sovereign Failover post endorses pilot light for cross-partition specifically: "We can run an application pilot light in another partition. This greatly reduces the cost of the infrastructure required in the second partition because it will only be built up when needed." (Source: sources/2026-01-30-aws-sovereign-failover-design-digital-sovereignty)

Three reasons this tier fits the cross-partition pattern especially well:

  • Duplicate infrastructure cost dominates cross-partition budget at higher tiers. Full warm standby in a second partition means paying for a second full production footprint in a second regulatory regime — often twice the compliance cost too.
  • Cross-partition failover is rare and discrete. Driven by sovereignty shifts, not minute-scale AZ failures. RTO in the hours range is acceptable where minute-scale RTO isn't required.
  • Data-tier replication is where the real engineering work lives anyway. Since cross-partition S3 Cross-Region Replication doesn't work, custom data sync is the hard part — compute-tier tear-up via IaC is the easy part.

What must be always-running

  • Data tier — replicated continuously; the subject of the custom cross-partition data-sync tooling.
  • Identity — roles, federation setup, PKI / cross-signed CAs; can't be provisioned during an incident.
  • DNS / network bootstrap — Route 53 zones, VPC / subnets / routing for the secondary, VPN / Direct Connect baseline.
  • Runbook + IaC templates — the actual "build up when needed" has to be reducible to a few commands that a human or automation can run.

What stays stopped until failover

  • Compute — EC2 / ECS / EKS / Lambda reservations scaled to zero.
  • Service endpoints — API Gateway / ALB / front-end traffic not accepting live traffic.
  • Caches, queue consumers, scheduled jobs.
  • Backup-only (below): nothing running in secondary; periodic backup copy only. RTO hours–days. Cross-partition-feasible if an out-of-partition backup bucket and manual restore tooling are built.
  • Warm standby (above): full stack running, smaller scale; faster RTO; higher steady-state cost. Post: "Warm standby or multi-site active-active setups mainly differ in the need for more complex network synchronization across partitions."
  • Multi-site active-active (top): parallel production; effectively zero RTO; most expensive and operationally complex across the partition boundary.

Seen in

Last updated · 200 distilled / 1,178 read