Skip to content

CONCEPT Cited by 1 source

Cyber resilience

Definition

Cyber resilience is the recovery leg of a three-leg security posture — prevention keeps threat actors out, detection finds them quickly, cyber resilience focuses on recovery: restoring a trustworthy environment when the source environment itself is no longer trusted.

Verbatim from the canonicalising source:

"Cyber resilience is the ability to recover workloads to a known- good state after an adversary has affected the environment. Prevention works to keep threat actors out and detection works to find them quickly. Cyber resilience focuses on recovery: restoring a trustworthy environment when backups, credentials, or parts of the infrastructure can no longer be assumed to be safe." (Source: sources/2026-05-20-aws-cyber-resilience-on-aws-a-reference-approach-for-recovery-from-ransomware-and-destructive-events)

Why cyber resilience is structurally different from generic DR

Generic disaster recovery handles fault disasters — a region fails, a power grid drops, a fibre cut takes out connectivity. The recovery substrate (backups, secondary infra, control plane) is trusted — only the primary failed.

Cyber resilience handles adversary disasters where any of the following may have been compromised:

  • Production credentials — cannot be assumed clean; rotating every secret is a recovery requirement, not a routine task.
  • Production data — the most recent backup may carry the same malware, encrypted files, or modified configurations as production.
  • Production infrastructure configuration — the running configuration may have been modified by the attacker; rebuilding from version-controlled IaC is safer than restoring config from backup.
  • The recovery path itself — if recovery uses the same credentials, network paths, or accounts as production, it inherits the compromise.

The architectural consequence: recovery cannot trust anything from the source environment by default. This drives every primitive in a cyber-resilience design — separate accounts, deletion-protected vaults, multi-party approval gates, validation pipelines, IaC-driven rebuild, comprehensive credential rotation.

Core architectural primitives

Cyber-resilience designs assemble five primitives (see the canonicalising source for the full reference architecture):

  1. Account isolationthree-account topology (Production / Recovery / IRE) so the recovery surface has no trust path back to the potentially compromised production surface.
  2. Service-enforced deletion protectionlogically air- gapped vault in Compliance mode (or S3 Object Lock) so even root / compromised admin can't shorten retention or delete recovery points within the retention window.
  3. Multi-party approvalMPA gate before any restore proceeds; recorded in CloudTrail.
  4. Multi-layer validationvalidation pipeline proving the backup is safe to use, not just recoverable. Runs inside the IRE so a tainted restore stays contained.
  5. Rebuild-Restore-Rotate frameworkthree-category sorting of what comes from where: infrastructure is code, data is backup, credentials are new.

Recovery-point selection: the most-recent-working copy

Generic DR's heuristic ("restore the most recent backup") fails for cyber events because the most recent backup may already carry the adversary's payload. Cyber-resilience selection (compromise- boundary RP selection) is reverse-chronological from before the event boundary: build an investigation timeline, walk backwards from the most recent candidate that predates the earliest indicator, validate, and step further back if validation fails.

This is why retention windows for cyber resilience need to be sized based on detection latency, not just the routine RPO target — "backup retention should include recovery points that predate realistic detection windows in your organization".

Operational prerequisite: exercise the workflow

Cyber events are rare, so the recovery muscle memory has to come from drills, not real incidents. The canonicalising source's seven-step starting checklist ends with "Exercise the full workflow, including investigation, validation, rebuild, restore, and cutover, on a regular schedule" — the highest-leverage item, because the rest of the architecture is only useful to the extent it's been exercised.

This composes with concepts/drill-muscle-memory and concepts/chaos-engineering at the cyber-event altitude.

Relation to general DR

Cyber resilience extends generic DR rather than replacing it:

Axis Generic DR Cyber resilience extension
Recovery target Most recent backup Most recent backup before event boundary
Backup trust Trusted Validated by multi-layer pipeline
Account topology Production + secondary Production + Recovery + IRE
Restore authorization Standard IAM MPA-gated
Credential handling Restored Rotated/re-issued
Config handling Restored Rebuilt from IaC
Rebuild surface Same as source Isolated environment

The 2026-03-31 streamlining-DR post canonicalised the general-DR layer; the 2026-05-20 cyber-resilience post canonicalises this extension layer.

Adversary-modelling assumption: the IaC source itself

The Rebuild-Restore-Rotate framework's load-bearing assumption is that the IaC source (templates, pipelines, source repositories) wasn't itself the attack target. If it was, "recovery starts further upstream with a trusted copy of source before rebuild can begin" — which is why knowing where your known-good source of configuration lives, and how it is protected, is a recovery design decision worth making in advance, not at incident time.

Seen in

Last updated · 542 distilled / 1,571 read