CONCEPT Cited by 1 source

Compromise-boundary recovery point selection¶

Definition¶

Compromise-boundary recovery point selection is the recovery-time algorithm for choosing which backup to restore when a cyber event has been confirmed. It differs structurally from generic disaster-recovery selection (which picks the most recent backup) because the most recent backup may already carry the adversary's payload.

The algorithm walks reverse-chronologically from the most recent candidate that predates the event boundary — the timestamp of the earliest plausible indicator from the investigation timeline — and runs each candidate through the validation pipeline; if validation fails, it steps further back.

Verbatim from the canonicalising source:

"For most operational recoveries, the most recent backup is the right one. For cyber events and for data corruption more generally, the most recent working copy is often a better target. If an adversary was present in the environment before detection, backups taken during that window might carry the same issues." (Source: sources/2026-05-20-aws-cyber-resilience-on-aws-a-reference-approach-for-recovery-from-ransomware-and-destructive-events)

The four-step algorithm¶

The canonicalising source's explicit selection algorithm:

Build an investigation timeline from CloudTrail + VPC Flow Logs + GuardDuty + Security Hub + workload logs to identify the earliest plausible indicator of the event. This timestamp = the event boundary.
Evaluate candidates in reverse chronological order, starting from the most recent backup that predates the event window.
Run the validation pipeline against each candidate. If validation fails, step back to the next candidate.
Approve the chosen recovery point with documentation of the approver and rationale.

This composes with MPA- gated restore authorization — the approval is the gating step of the recovery workflow's Stage 3.

The event boundary: defining "before"¶

The event boundary is earliest plausible indicator, not confirmed-attack-timestamp — because the conservative posture during recovery selection is "assume the adversary was present earlier than we currently believe".

This is why the investigation timeline draws from multiple sources:

CloudTrail — control-plane API activity; identity changes, IAM modifications, resource creation/ deletion.
VPC Flow Logs — data-plane network activity; unexpected egress destinations, lateral movement patterns.
GuardDuty findings — threat- detection signals; the detected indicators that triggered the investigation.
Security Hub — aggregated security-finding view across services.
Workload logs — application-altitude indicators (anomalous user behaviour, unusual database queries).

The timeline's job is to push the event boundary as far back as the evidence supports — because the recovery target has to be earlier than the boundary, and the boundary is defined by what the investigation can prove was earliest.

Reverse-chronological with validation as the per-candidate filter¶

The selection isn't just "pick the latest pre-boundary backup" — it's "pick the latest pre-boundary backup that also passes validation". This is critical because:

The event boundary is a lower bound on adversary presence based on detected indicators.
The adversary may have been present earlier than the detected indicators (the "unknown unknown" case).
Validation catches modifications the timeline doesn't surface — e.g. data corruption that wasn't logged because no API call was made, just an in-place file edit on a compromised host.

The validation pipeline is the second guard; the timeline is the first. Both are needed.

Recovery point objective implications¶

This algorithm has consequences for RPO design. Generic DR sizes RPO based on routine recovery cadence — how much data loss is acceptable when the most recent backup is restored. Cyber-resilience sizing has to consider detection latency:

"Backup retention should include recovery points that predate realistic detection windows in your organization. Detection timing varies widely by organization and by threat type, so this is a number to set based on your own investigation capabilities and to revisit as those mature."

If your detection latency is 30 days, your retention has to extend beyond 30 days plus a margin for the "unknown unknown" case where adversary presence predates detected indicators. This is why mature cyber-resilience designs often have retention windows significantly longer than the routine RPO.

The data-loss tradeoff¶

The selection algorithm makes an explicit tradeoff: lose more recent data to gain confidence the restored data is clean. The further back the selection has to step (because validation keeps failing), the more recent data is lost. This is acceptable in cyber-event recovery because "restoring untrusted data into a new environment defeats the purpose of the validation" — and is the cyber-resilience reason that retention needs to be sized generously.

Documentation as part of the algorithm¶

Step 4 ("Approve the chosen recovery point with documentation of the approver and rationale") is structurally part of the algorithm, not a post-incident formality:

Approver identity — recorded for audit and accountability.
Rationale — why this candidate was chosen (which earlier candidates failed validation, which evidence pushed the event boundary).
Recorded automatically in CloudTrail when MPA is configured — "the approval action is automatically recorded as an AWS CloudTrail management event."

Generalisation beyond AWS¶

The algorithm is substrate-agnostic — applicable to any backup system where:

Multiple time-ordered recovery points exist.
An investigation timeline can establish a lower-bound event boundary.
Each candidate can be validated independently.

GCP / Azure / on-prem cyber-recovery designs apply the same algorithm with their respective audit-log + backup primitives.

Seen in¶

sources/2026-05-20-aws-cyber-resilience-on-aws-a-reference-approach-for-recovery-from-ransomware-and-destructive-events — canonical wiki reference; explicit four-step algorithm; most-recent-working-copy framing; investigation-timeline source list; retention-window-sizes-with-detection-latency framing.

concepts/cyber-resilience — the parent posture.
concepts/multi-layer-restore-validation-pipeline — the per- candidate filter.
concepts/point-in-time-recovery — the underlying capability.
concepts/rpo-rto — the SLO framework that has to be re-sized for cyber events.
concepts/disaster-recovery-tiers — the general DR ladder this selection algorithm specialises.
patterns/event-boundary-driven-recovery-point-selection — the pattern this concept canonicalises.
systems/aws-cloudtrail, systems/amazon-vpc-flow-logs, systems/amazon-guardduty, systems/aws-security-hub — investigation-timeline substrates.
systems/aws-security-incident-response — coordinated triage support for timeline construction.