Skip to content

PATTERN Cited by 1 source

Three-account cyber-recovery topology

Pattern

Use three distinct AWS accounts inside one AWS Organization to host a cyber-resilient recovery design:

  1. Production Account(s) — where workloads run during normal operation; isolated for investigation when a cyber event is confirmed.
  2. Recovery Account — owns the logically air- gapped vault and configures vault sharing, restore authorization, and MPA approvers. SCPs restrict the account to backup operations so a compromised production identity can't modify these controls.
  3. Isolated Recovery Environment (IRE) — where backups are restored, validated, and the new production environment is rebuilt before cutover; no trust relationship to Production, no VPC peering, no internet-facing resources, PrivateLink for AWS service APIs only.

Verbatim from the canonicalising source:

"The recovery environment, including its identities, keys, and network paths, shouldn't share a trust boundary with the environment being recovered. If production identity is compromised, recovery must be able to proceed without depending on it. Most customers achieve this using separate AWS accounts inside an AWS Organization." (Source: sources/2026-05-20-aws-cyber-resilience-on-aws-a-reference-approach-for-recovery-from-ransomware-and-destructive-events)

Why three accounts, not two

The wiki's existing clean-room recovery account canonicalised in 2026-03-31 is a two-account design (Production + clean-room recovery). The 2026-05-20 cyber-resilience post extends it to three accounts by separating:

  • Storage governance (Recovery Account) from
  • Execution / rebuild (IRE)

The reason is further blast-radius reduction:

Account Role What's protected
Recovery Stores backups, governs access Even if IRE is compromised by a tainted restore, vault is unreachable
IRE Restores + validates Tainted restores cannot reach Production or vault

A two-account design conflates these: if the recovery account is also the rebuild account, a tainted restore could in principle delete or modify the vault. The three-account split prevents that.

Account responsibilities by stage

Each cyber-recovery workflow stage involves different accounts:

Stage Production Account Recovery Account IRE
Normal operation Workloads run Receives backup pushes Pre-configured, idle
Stage 1 (timeline) Source of CloudTrail/VPC Flow Logs
Stage 2 (validate) Authorises share to IRE Restores candidates, runs validation pipeline
Stage 3 (approval) MPA approvers act
Stage 4 (rebuild + restore) Isolated for investigation Authorises restore Rebuilds IaC, restores approved data
Stage 5 (cutover) Stays isolated Becomes new Production

After cutover, the IRE becomes the new Production Account — and a fresh IRE has to be re-established for the next event. This is why the canonicalising source recommends pre-configuring the IRE in advance: incident time is not the right time to be building it.

SCP-enforced isolation

The canonicalising source's checklist explicitly says "Use SCPs to enforce isolation." The SCPs cover:

  • Recovery Account: deny everything except backup operations (vault management, MPA configuration, sharing).
  • IRE: deny internet-routable resources (no IGW, no NAT GW), deny VPC peering to Production, deny IAM trust relationships pointing to Production.

These are structural controls — even if an admin's IAM permissions allow an action, the SCP at the OU level blocks it.

When to use this pattern

Use this pattern when:

  • The workload is critical enough that ransomware / destructive event recovery is a real requirement, not a hypothetical.
  • Detection latency for security events is measured in days/weeks (so backups taken during the attack window can't be assumed clean).
  • The team can absorb the operational overhead of three pre-configured accounts with their own SCPs and monitoring.

Weaker fit when:

  • Routine DR (region failure, AZ failure) is the primary concern — the existing two-axis (cross-Region + cross-account) design from the 2026-03-31 streamlining-DR post is sufficient.
  • The workload's data has very short relevance (hours), so retention spanning detection windows isn't economical.
  • The team doesn't have the operational maturity to keep IRE pre- configured (in which case routine DR drills are a higher-leverage starting point).

Composition with other patterns

Failure modes

  1. IRE not pre-configured. Building the IRE at incident time takes hours-to-days; recovery MTTR is dominated by setup. Mitigation: provision IRE in advance with monitoring and alarms on drift.
  2. Cross-account references in Production not inventoried. Stage 5 cutover discovers IAM trust policies / KMS key grants / resource policies that point at the old Production Account ID; updates become an emergency. Mitigation: maintain an Access-Analyzer-driven inventory in the recovery runbook.
  3. Recovery Account's MPA approvers unreachable. Stage 3 stalls; the parallel work in Stages 1/2/4 produces nothing usable. Mitigation: predefine multiple MPA approvers across geos and timezones; document escalation paths.
  4. IaC source (separate from this topology) compromised. Stage 4 rebuild leg has nothing trusted to draw from. Mitigation: independent backup of the IaC repository; signed-artifact immutable mirrors.
  5. SCP-enforced isolation drifts. SCP changes in the Organization accidentally weaken IRE's posture. Mitigation: AWS Config rules to detect SCP drift; incident-response playbooks for SCP modifications.

Generalisation beyond AWS

The three-environment pattern is substrate-agnostic:

  • GCP — Production project + Recovery project (with Backup and DR Service vault) + IRE project (with VPC SC perimeter).
  • Azure — Production subscription + Recovery subscription (with Backup vault, immutable policies) + IRE subscription (with private endpoints, no peering).
  • On-prem — Production network + Recovery storage zone (offline or one-way replication) + isolated rebuild zone (air-gapped or one-way connectivity).

The structural property is three independent isolation boundaries with disjoint trust relationships, governed centrally for SCP-style policy inheritance.

Seen in

Last updated · 542 distilled / 1,571 read