CONCEPT Cited by 1 source
Automated backup validation¶
Definition¶
Automated backup validation is the property that every backup produced by a database platform is proven to be restorable as part of the backup creation pipeline — not merely written to durable storage and assumed correct. The strongest form proves restorability by actually restoring the backup to a dedicated node and snapshotting that node.
This closes the most common silent failure mode in disaster recovery: backups that exist, look fine, and are unrestorable when you need them (due to corruption, missing files, encryption-key loss, version-incompatibility, etc.). Schrödinger's backup — the state of any backup is unknown until you try to restore it — is folklore wisdom in the ops community because this failure mode is that common.
The PlanetScale instance — restore + replay¶
Brian Morrison II's canonical PlanetScale framing (2024-01-24):
"While both PlanetScale and Aurora support automated backups, we also validate the backups of our databases automatically every single time a new backup is created. This is only possible because we use the traditional approach for MySQL replication. Instead of creating a fresh snapshot of your database every time a backup is performed, we restore the most recent backup of your database to a special MySQL node in the cluster that's dedicated to this process. Once the backup is restored, we use the built-in MySQL replication to copy the latest changes into this node before creating a new backup. If a backup is unhealthy, this process will fail and a fresh backup will be triggered to take its place. By following this process, you can always be confident that backups on our platform are validated and healthy to restore from." (Source: sources/2026-04-21-planetscale-planetscale-vs-amazon-aurora-replication)
The pipeline is:
- Provision a dedicated backup-taking node in the cluster.
- Restore the most recent backup into it (this is the validation step — a corrupt backup fails here).
- Catch up the restored node via MySQL binlog replication to the current primary state.
- Snapshot the caught-up node as the new backup.
- Fail-closed on validation error: if any step fails, the existing backup is flagged as unhealthy and a fresh backup is triggered.
Under this design, every published backup has been restored at least once in the recent past — there is no "we've never tried this" state.
See patterns/validated-backup-via-restore-replay for the full pattern page and patterns/dedicated-backup-instance-with-catchup-replication for the related architectural shape.
Why "only possible because we use traditional replication"¶
Morrison's framing is that the approach is only feasible on traditional MySQL binlog-replicated clusters — because you need:
- A full-copy replica substrate to stand up a restoration target whose only job is backup-taking.
- Binlog replication to catch the restored node up to the current primary state.
- The backup format to be a byte-for-byte restorable snapshot of MySQL's data directory (not an incremental log of segment acks against a distributed storage appliance).
Aurora's substrate — redo-log-forwarding to distributed storage segments with read-only compute nodes that don't hold their own data copies — doesn't admit this validation shape naturally; Aurora's backups are storage-layer snapshots that the platform trusts without the restore-replay proof.
Related concepts¶
- concepts/point-in-time-recovery — the backup-driven recovery capability; integrity of PITR depends on backup validity.
- concepts/replica-creation-from-backup — the related pattern of starting a new replica from a recent backup; same operation mechanics, different purpose.
- concepts/primary-vs-replica-as-replication-source — the backup node can be fed by primary or replica.
- concepts/logical-vs-physical-backup — validation mechanics differ (logical backups can be dry-run-applied; physical need a full restore).
Seen in¶
- sources/2026-04-21-planetscale-planetscale-vs-amazon-aurora-replication — Brian Morrison II (PlanetScale, 2024-01-24). Canonical wiki disclosure of PlanetScale's restore-replay backup validation pipeline and the architectural claim that this approach is only feasible on traditional binlog-replicated clusters.
Related¶
- patterns/validated-backup-via-restore-replay — full pattern page.
- patterns/dedicated-backup-instance-with-catchup-replication — adjacent architectural pattern using a dedicated backup-taking node.
- systems/vtbackup — the Vitess tool that implements the backup-taking role.
- systems/planetscale — product canonicalising this pipeline.
- concepts/aurora-storage-quorum — the counter-example substrate where this validation approach doesn't naturally apply.