PATTERN Cited by 1 source
Configuration-driven tenant onboarding¶
Pattern¶
Treat new-tenant onboarding as a configuration change, not an infrastructure-provisioning exercise. All infrastructure the new tenant depends on — VPC, subnets, load balancer, IAM roles, PrivateLink endpoints, downstream- service connections — is pre-wired at tier creation (see concepts/pre-integration-at-tier-creation). Onboarding reduces to: register a listener rule, create a target group, create a dedicated ECS cluster, deploy tenant configuration, validate.
Canonicalised on the wiki by the 2026-05-12 AWS Architecture Blog post (Source: sources/2026-05-12-aws-building-hybrid-multi-tenant-architecture-for-stateful-services). Verbatim:
"Configuration-driven onboarding: New tenant onboarding became a configuration change rather than an infrastructure provisioning exercise, dramatically reducing time and manual effort."
Before and after (from canonical source)¶
| Onboarding phase | Before (account-per-tenant) | After (configuration-driven) |
|---|---|---|
| AWS account provisioning | ~2 weeks | N/A (shared account) |
| VPC + networking | ~3 weeks | N/A (inherited from infra group) |
| IAM role configuration | ~1 week | N/A (tier-level shared role) |
| Downstream integration | ~2 weeks | N/A (tier-level PrivateLink) |
| Product configuration + test | included | ~7 days |
| Total | ~52 days | ~7 days (−86%) |
The 80% engineering-effort reduction per onboarding is attributed to the removal of the first four phases, not to speedup of the fifth.
The new-tenant checklist¶
After this pattern, onboarding a new tenant is:
- Add ALB listener rule routing the tenant's path (or header) to a new target group. See patterns/alb-path-routing-per-tenant.
- Create target group for the tenant's backend.
- Create dedicated ECS cluster for the tenant. See patterns/dedicated-ecs-cluster-per-tenant.
- Register ECS service in the target group.
- Deploy tenant configuration — task-definition env vars
with
TENANT_ID, cache endpoint, resource sizing. - Validate — integration test against downstream services, smoke test the tenant path, confirm memory / latency baselines.
Steps 1–4 are ~3–5 AWS API calls. Step 5 is an ECS task deployment. Step 6 is the residual 7-day cost — "primarily testing and validation, because infrastructure is pre- provisioned."
Why this is hard without pre-integration¶
Without the concepts/pre-integration-at-tier-creation lever, every onboarding triggers network-engineering work: VPC peering, PrivateLink setup, IAM role creation, cross-account trust relationships. This work is slow because:
- Multiple teams are involved (network, security, downstream- service team, requesting team).
- Approvals are gated on security review.
- Integration testing requires the downstream-service owner's participation.
- Documentation and operational handoff add days.
None of this work is tenant-specific; it's infrastructure provisioning that could be done once. The pre-integration pattern does exactly that, unlocking configuration-driven onboarding.
Composition with adjacent patterns¶
- patterns/shared-privatelink-at-tier-level — shares downstream-service connectivity across tenants.
- patterns/dedicated-ecs-cluster-per-tenant — the compute backend spun up per tenant.
- patterns/alb-path-routing-per-tenant — the routing layer that gets a new rule per tenant.
- patterns/hybrid-multi-tenant-architecture — the enclosing architectural shape.
All four patterns together deliver the onboarding-speedup property; omitting any one of them re-introduces per-tenant infrastructure work.
Configuration artifacts¶
The tenant's onboarding configuration is small:
- ECS task definition JSON (env vars, image, resources)
- ALB listener rule (path / header pattern, priority)
- Target group (name, protocol, port, health-check path)
- ECS service definition (cluster, task count, autoscaling triggers)
- CloudWatch alarms (memory 70/85, latency 2× baseline, 5XX rate)
All expressible as CloudFormation / CDK / Terraform. No code
changes to the application layer (the application reads
TENANT_ID from env and behaves accordingly).
When to use¶
- Multi-tenant SaaS with tens to thousands of tenants where per-tenant onboarding cost is a business constraint.
- Tenants with shared downstream-service topology — heterogeneous dependencies defeat the shared-endpoint property.
- Stable, well-understood tier profiles — the tier has to be designed for all likely tenants before any tenant is onboarded.
When not to use¶
- Customers require bespoke infrastructure — per-tenant downstream service integrations, per-tenant networking, per- tenant IAM policies. Each bespoke requirement re-introduces per-tenant provisioning.
- Tier definitions are unstable — if the tier itself changes often, the amortisation doesn't compound.
- Regulatory requirements force per-tenant boundaries — concepts/account-per-tenant-isolation is mandated.
- Very small numbers of tenants (<10) — account-per-tenant onboarding cost amortises acceptably.
Anti-patterns¶
- Calling onboarding "configuration-driven" while still running per-tenant provisioning scripts. The name is misleading if infrastructure is still being created per tenant.
- Tenant-specific tier customisation via feature flags. Feature flags inside the tenant's task are fine; feature- flag-driven infrastructure (per-tenant target group count, per-tenant subnet sets) re-couples onboarding to infrastructure.
- Per-tenant IAM roles disguised as configuration. Creating a new IAM role per tenant is still an IAM operation, not a configuration change.
- Shipping onboarding as a manual runbook rather than automation. Even the remaining 7 days is improvable; manual runbooks don't compound.
Measured outcomes (AWS canonical)¶
From the post:
- Tenant onboarding time: 52 days → 7 days (−86%)
- Infrastructure setup steps per tenant: −80%
- Engineering effort per onboarding: −80%
- Feature release time: 2–3 days → 1 day
- Tenant capacity: up to 100 tenants per AWS account
The feature release time reduction is a secondary dividend: once onboarding is configuration-driven, product-configuration changes flow through the same pipeline, enabling 1-day releases.
Caveats¶
- The 7-day residual is "testing and validation." The post doesn't quantify how much is mandatory (customer testing, SLA compliance) vs improvable (automated smoke tests, pre-warmed caches).
- Not all onboarding cost is engineering cost. Commercial (contract, legal, pricing), security review, and customer- side integration effort aren't in the 52 / 7 day numbers but still bound customer time-to-value.
- Tier-creation cost is not amortised into the onboarding number. Tier creation itself can take weeks, but it happens once per tier, not per tenant.
- Configuration mistakes can cause outages. Listener-rule priority collisions, target-group misconfig, ECS task- definition typos. The blast radius depends on how much of the configuration pipeline is automated vs manual.
Seen in¶
- sources/2026-05-12-aws-building-hybrid-multi-tenant-architecture-for-stateful-services — canonical wiki anchor. AWS ad-serving platform's migration to configuration-driven tenant onboarding. 52d → 7d onboarding reduction explicitly attributed to pre-integration; "a configuration change rather than an infrastructure provisioning exercise" verbatim; 80% engineering-effort reduction disclosed.
Related¶
- concepts/pre-integration-at-tier-creation — the enabling lever
- concepts/tenant-onboarding-time — the metric this pattern optimises
- concepts/hybrid-multi-tenant-architecture — the enclosing shape
- patterns/hybrid-multi-tenant-architecture
- patterns/shared-privatelink-at-tier-level — the dependency-sharing mechanism
- patterns/alb-path-routing-per-tenant — the per-tenant routing rule
- patterns/dedicated-ecs-cluster-per-tenant — the per-tenant compute backend