Skip to content

PATTERN Cited by 1 source

PR preview of CloudFormation ChangeSet

Problem

Pull requests that edit CloudFormation templates show a text diff of the template, but the operationally important question is what will actually change when this is applied. The text diff and the real change can diverge:

  • A small property edit can trigger resource replacement (delete + recreate), which is catastrophic for stateful resources.
  • The same template applied across many AWS accounts (an AWS Organization) can produce different per-account change sets, depending on each account's current state.
  • Parameter-only changes show no template diff but produce a real change.
  • Cross-stack references can cascade in non-obvious ways.

A reviewer looking at the template diff alone can't reliably answer "is this change safe?"

Solution

On every PR that touches a CloudFormation template, automatically:

  1. Call CreateChangeSet against every AWS account in the organisation (or the relevant subset).
  2. Read back the per-account ChangeSet JSON.
  3. Merge into a human-readable summary.
  4. Post the summary as a PR comment.
  5. Drop the ChangeSet (DeleteChangeSet) so it doesn't accumulate.

On subsequent pushes, re-run. On PR merge, the approved change is executed.

Zalando's canonical instance

From the 2024-01 metadpata postmortem:

"We have implemented automated previews in the Pull Request comments. This feature leverages the AWS CloudFormation 'ChangeSet' feature. When an updated CF stack template is provided to the CloudFormation 'CreateChangeSet' endpoint, CloudFormation generates a json preview of the changes, which then can be executed or rejected. We read this ChangeSet from each account in our AWS Organization and merge them to create a human readable preview of changes in a PR comment. After the preview is created, the ChangeSet is dropped."sources/2024-01-22-zalando-tale-of-metadpata-the-revenge-of-the-supertools

The post also describes the associated quality-check stack atop the preview:

"for creation/decommissioning of critical resources, we have introduced several automated quality checks which ensure that all the change corresponds to the user request and the pull request description. These checks also introduce additional approval from the respective account or cost center owners and validation from respective managers. The checks are implemented as a GitHub bot that comments on the Pull Request and blocks the merge until all the checks are validated." — same source

So two stacked layers:

  1. Change preview for every PR.
  2. Approval gate for critical creations/decommissions, pulling in account owners, cost-center owners, managers.

Why this catches metadpata-class bugs

The metadpata incident would have produced a ChangeSet with Remove entries against Route 53 hosted zones in many accounts. The PR preview would have shown "will delete N hosted zones across accounts [A, B, C, …]" — very difficult to miss in review, even if the underlying YAML typo was subtle. The template diff alone does not make the deletion obvious; the ChangeSet preview does.

Mechanism details (Zalando-disclosed)

  • Iterate every AWS account in the organisation.
  • Call CreateChangeSet per-account.
  • Merge JSON outputs into a human-readable form (format undisclosed — likely grouped by action type or by resource).
  • Post as PR comment.
  • Call DeleteChangeSet after posting.
  • Implemented as a GitHub bot that also blocks merge until quality checks pass for critical resources.

Cost and latency

Not disclosed in the post. Inferred shape:

  • CreateChangeSet is asynchronous; wall-clock latency scales with per-account CloudFormation load. Running across hundreds of accounts in parallel is the likely approach.
  • ChangeSets are free to create and delete (no CloudFormation charge); the cost is the per-call API rate limit.

Prerequisites

  • Cross-account permissions for the bot to call CreateChangeSet in every account.
  • Single CloudFormation entry point per stack per account — the bot has to know which stack name to preview in each account.
  • Per-account uniform parameters or a way to resolve them.
  • PR-comment latency tolerance — reviewers have to wait for the preview before approving.
  • Custom resources and macros rendered opaquely in ChangeSets need a separate review path.

When it doesn't help

  • Configuration that isn't in CloudFormation. Raw boto3 scripts, Terraform, non-CloudFormation IaC — need their own preview mechanism.
  • Drift between the stack and actual resources. The ChangeSet assumes the stack state; if the stack has drifted, the preview may understate the actual delta at execute time.
  • Non-resource changes. IAM policy evaluations that depend on runtime context (e.g., which principal is calling) don't show up as ChangeSet entries.
  • Secrets rotation in CFN. Secret values that change per-apply (KMS-backed) show as no-op in the preview.

Composes with

Seen in

Last updated · 501 distilled / 1,218 read