PATTERN Cited by 1 source
Automate account lifecycle¶
Pattern¶
In an account-per-tenant platform, automate AWS account creation as a full orchestrated workflow (typically Step Functions + AWS Organizations APIs + CloudFormation baseline stacks), but keep retirement as regularly-run manual scripts. The split is deliberate: automate where the workflow has waits, retries, and observable state; leave simpler batch cleanup as scripts.
Canonical shape¶
"Account creation is a fully automated process using AWS Step Functions, but the retirement and closure of accounts are performed manually through regularly run scripts." (Source: sources/2026-02-25-aws-6000-accounts-three-people-one-platform)
Creation (Step Functions)¶
- Create account via Organizations API (returns async; wait state).
- Poll account-creation status until ACTIVE.
- Move account into target OU (applies inherited SCPs).
- Assume cross-account role in the new account.
- Apply baseline CloudFormation stacks: IAM roles, VPC / networking, logging, tagging, monitoring subscription.
- Register the account as a target in StackSets for application code.
- Seed tenant-specific data / config.
- Notify platform tooling (catalog, billing, support).
Each step has natural retries, natural waits, and is independently observable — ideal fit for Step Functions.
Retirement (scripts)¶
- List accounts past retention / deactivation date.
- Back up tenant data (if required by policy).
- Close the account via Organizations.
- Clean up cross-account references (StackSet instance removal, IAM trust policies, Cost Explorer links).
- Run "regularly" — e.g. weekly or on-demand.
Why the asymmetry¶
The architecturally interesting signal is not that both are automated or both are manual, but that the team chose differently for each workflow based on fit:
- Creation happens frequently and synchronously (new tenant needs an account now, end-to-end takes many minutes with waits) → Step Functions' async-orchestration model shines.
- Retirement happens rarely, asynchronously, in batches (close 60-day-inactive tenants once a week) → the overhead of a state-machine is not worth it; a Python script is more debuggable and auditable.
"Some of the involved workflows lend themselves well to automation, whereas others can be implemented more effectively using traditional scripting and manual operations, as long as the overhead introduced is low enough." (Source: sources/2026-02-25-aws-6000-accounts-three-people-one-platform)
This is a reusable architectural heuristic: automate based on workflow fit (waits, retries, observability, frequency), not on a blanket "automate everything" dogma.
What this pattern doesn't cover¶
- Per-tenant data migration between accounts — not part of the lifecycle; needs its own orchestration.
- Customer-initiated offboarding SLAs — if a tenant requests account closure immediately, the "regularly run scripts" cadence is incompatible; productise offboarding on-demand.
- Account-creation rate limits from AWS Organizations — the pattern's step 1 is itself throttled by AWS; bulk onboarding hits this ceiling.
Seen in¶
- sources/2026-02-25-aws-6000-accounts-three-people-one-platform — the canonical ProGlove instance: Step Functions for creation, scripts for retirement.
Related¶
- systems/aws-step-functions, systems/aws-organizations.
- concepts/account-per-tenant-isolation — the architecture requiring this pattern.
- patterns/platform-engineering-investment — of which lifecycle automation is a concrete instance.
- patterns/fan-out-stackset-deployment — the per-account application-deployment companion.