PATTERN Cited by 1 source
Three-account cyber-recovery topology¶
Pattern¶
Use three distinct AWS accounts inside one AWS Organization to host a cyber-resilient recovery design:
- Production Account(s) — where workloads run during normal operation; isolated for investigation when a cyber event is confirmed.
- Recovery Account — owns the logically air- gapped vault and configures vault sharing, restore authorization, and MPA approvers. SCPs restrict the account to backup operations so a compromised production identity can't modify these controls.
- Isolated Recovery Environment (IRE) — where backups are restored, validated, and the new production environment is rebuilt before cutover; no trust relationship to Production, no VPC peering, no internet-facing resources, PrivateLink for AWS service APIs only.
Verbatim from the canonicalising source:
"The recovery environment, including its identities, keys, and network paths, shouldn't share a trust boundary with the environment being recovered. If production identity is compromised, recovery must be able to proceed without depending on it. Most customers achieve this using separate AWS accounts inside an AWS Organization." (Source: sources/2026-05-20-aws-cyber-resilience-on-aws-a-reference-approach-for-recovery-from-ransomware-and-destructive-events)
Why three accounts, not two¶
The wiki's existing clean-room recovery account canonicalised in 2026-03-31 is a two-account design (Production + clean-room recovery). The 2026-05-20 cyber-resilience post extends it to three accounts by separating:
- Storage governance (Recovery Account) from
- Execution / rebuild (IRE)
The reason is further blast-radius reduction:
| Account | Role | What's protected |
|---|---|---|
| Recovery | Stores backups, governs access | Even if IRE is compromised by a tainted restore, vault is unreachable |
| IRE | Restores + validates | Tainted restores cannot reach Production or vault |
A two-account design conflates these: if the recovery account is also the rebuild account, a tainted restore could in principle delete or modify the vault. The three-account split prevents that.
Account responsibilities by stage¶
Each cyber-recovery workflow stage involves different accounts:
| Stage | Production Account | Recovery Account | IRE |
|---|---|---|---|
| Normal operation | Workloads run | Receives backup pushes | Pre-configured, idle |
| Stage 1 (timeline) | Source of CloudTrail/VPC Flow Logs | — | — |
| Stage 2 (validate) | — | Authorises share to IRE | Restores candidates, runs validation pipeline |
| Stage 3 (approval) | — | MPA approvers act | — |
| Stage 4 (rebuild + restore) | Isolated for investigation | Authorises restore | Rebuilds IaC, restores approved data |
| Stage 5 (cutover) | Stays isolated | — | Becomes new Production |
After cutover, the IRE becomes the new Production Account — and a fresh IRE has to be re-established for the next event. This is why the canonicalising source recommends pre-configuring the IRE in advance: incident time is not the right time to be building it.
SCP-enforced isolation¶
The canonicalising source's checklist explicitly says "Use SCPs to enforce isolation." The SCPs cover:
- Recovery Account: deny everything except backup operations (vault management, MPA configuration, sharing).
- IRE: deny internet-routable resources (no IGW, no NAT GW), deny VPC peering to Production, deny IAM trust relationships pointing to Production.
These are structural controls — even if an admin's IAM permissions allow an action, the SCP at the OU level blocks it.
When to use this pattern¶
Use this pattern when:
- The workload is critical enough that ransomware / destructive event recovery is a real requirement, not a hypothetical.
- Detection latency for security events is measured in days/weeks (so backups taken during the attack window can't be assumed clean).
- The team can absorb the operational overhead of three pre-configured accounts with their own SCPs and monitoring.
Weaker fit when:
- Routine DR (region failure, AZ failure) is the primary concern — the existing two-axis (cross-Region + cross-account) design from the 2026-03-31 streamlining-DR post is sufficient.
- The workload's data has very short relevance (hours), so retention spanning detection windows isn't economical.
- The team doesn't have the operational maturity to keep IRE pre- configured (in which case routine DR drills are a higher-leverage starting point).
Composition with other patterns¶
- patterns/mpa-gated-restore-authorization — Stage 3 gate; approvers configured in the Recovery Account.
- patterns/parallel-investigation-validation-rebuild — the workflow shape that runs across the three accounts.
- patterns/event-boundary-driven-recovery-point-selection — the algorithm Stage 1+2 jointly execute.
- patterns/iac-rebuild-from-separate-version-control — the rebuild leg's source-of-truth requirement.
- concepts/cross-account-backup + concepts/cross-region-backup — orthogonal isolation axes the three-account topology composes with (each Production / Recovery / IRE can additionally span Regions).
Failure modes¶
- IRE not pre-configured. Building the IRE at incident time takes hours-to-days; recovery MTTR is dominated by setup. Mitigation: provision IRE in advance with monitoring and alarms on drift.
- Cross-account references in Production not inventoried. Stage 5 cutover discovers IAM trust policies / KMS key grants / resource policies that point at the old Production Account ID; updates become an emergency. Mitigation: maintain an Access-Analyzer-driven inventory in the recovery runbook.
- Recovery Account's MPA approvers unreachable. Stage 3 stalls; the parallel work in Stages 1/2/4 produces nothing usable. Mitigation: predefine multiple MPA approvers across geos and timezones; document escalation paths.
- IaC source (separate from this topology) compromised. Stage 4 rebuild leg has nothing trusted to draw from. Mitigation: independent backup of the IaC repository; signed-artifact immutable mirrors.
- SCP-enforced isolation drifts. SCP changes in the Organization accidentally weaken IRE's posture. Mitigation: AWS Config rules to detect SCP drift; incident-response playbooks for SCP modifications.
Generalisation beyond AWS¶
The three-environment pattern is substrate-agnostic:
- GCP — Production project + Recovery project (with Backup and DR Service vault) + IRE project (with VPC SC perimeter).
- Azure — Production subscription + Recovery subscription (with Backup vault, immutable policies) + IRE subscription (with private endpoints, no peering).
- On-prem — Production network + Recovery storage zone (offline or one-way replication) + isolated rebuild zone (air-gapped or one-way connectivity).
The structural property is three independent isolation boundaries with disjoint trust relationships, governed centrally for SCP-style policy inheritance.
Seen in¶
- sources/2026-05-20-aws-cyber-resilience-on-aws-a-reference-approach-for-recovery-from-ransomware-and-destructive-events — canonical wiki reference; defines the three-account topology with explicit role-per-account; "Most customers achieve this using separate AWS accounts inside an AWS Organization"; SCP-enforced isolation; IRE-pre-configuration recommendation.
Related¶
- concepts/cyber-resilience — the parent posture.
- concepts/isolated-recovery-environment — the IRE concept.
- concepts/clean-room-recovery-account — the Recovery Account concept (parent).
- concepts/cross-account-backup — the storage-layer isolation axis.
- concepts/blast-radius — the underlying principle.
- concepts/service-control-policy — the enforcement mechanism.
- patterns/mpa-gated-restore-authorization — the Stage 3 gate.
- patterns/parallel-investigation-validation-rebuild — the workflow that runs across the topology.
- systems/aws-organizations — the multi-account container.
- systems/aws-backup-logically-air-gapped-vault — the Recovery Account's storage primitive.