Skip to content

PATTERN Cited by 1 source

Checkpoint before risky step

Pattern

Before an AI agent executes any potentially destructive command in its sandbox, take a copy-on-write checkpoint of the sandbox state. If the command destroys files, removes tooling, or corrupts the environment, restore to the checkpoint instead of rebuilding from scratch.

Mechanism

  1. Agent identifies a command as potentially risky (file deletion, package removal, config mutation).
  2. Sandbox runtime takes a CoW checkpoint — fast because only page-table metadata is duplicated.
  3. Command executes in the sandbox.
  4. On failure or undesired outcome: restore checkpoint (~seconds), retry with a different approach.
  5. On success: optionally discard the checkpoint or advance to a new one.

Trade-offs

Pro Con
Worst case is "restore and retry" not "rebuild from backup" Requires a sandbox runtime that supports CoW snapshots
Cheap enough to be a reflex (per every risky step) Checkpoint restore time (~9s observed) adds latency on failure path
Enables unattended agent operation State outside the sandbox (network calls, DB writes) is not captured

Seen in

Last updated · 542 distilled / 1,571 read