Skip to content

PATTERN Cited by 1 source

Configuration-driven tenant onboarding

Pattern

Treat new-tenant onboarding as a configuration change, not an infrastructure-provisioning exercise. All infrastructure the new tenant depends on — VPC, subnets, load balancer, IAM roles, PrivateLink endpoints, downstream- service connections — is pre-wired at tier creation (see concepts/pre-integration-at-tier-creation). Onboarding reduces to: register a listener rule, create a target group, create a dedicated ECS cluster, deploy tenant configuration, validate.

Canonicalised on the wiki by the 2026-05-12 AWS Architecture Blog post (Source: sources/2026-05-12-aws-building-hybrid-multi-tenant-architecture-for-stateful-services). Verbatim:

"Configuration-driven onboarding: New tenant onboarding became a configuration change rather than an infrastructure provisioning exercise, dramatically reducing time and manual effort."

Before and after (from canonical source)

Onboarding phase Before (account-per-tenant) After (configuration-driven)
AWS account provisioning ~2 weeks N/A (shared account)
VPC + networking ~3 weeks N/A (inherited from infra group)
IAM role configuration ~1 week N/A (tier-level shared role)
Downstream integration ~2 weeks N/A (tier-level PrivateLink)
Product configuration + test included ~7 days
Total ~52 days ~7 days (−86%)

The 80% engineering-effort reduction per onboarding is attributed to the removal of the first four phases, not to speedup of the fifth.

The new-tenant checklist

After this pattern, onboarding a new tenant is:

  1. Add ALB listener rule routing the tenant's path (or header) to a new target group. See patterns/alb-path-routing-per-tenant.
  2. Create target group for the tenant's backend.
  3. Create dedicated ECS cluster for the tenant. See patterns/dedicated-ecs-cluster-per-tenant.
  4. Register ECS service in the target group.
  5. Deploy tenant configuration — task-definition env vars with TENANT_ID, cache endpoint, resource sizing.
  6. Validate — integration test against downstream services, smoke test the tenant path, confirm memory / latency baselines.

Steps 1–4 are ~3–5 AWS API calls. Step 5 is an ECS task deployment. Step 6 is the residual 7-day cost — "primarily testing and validation, because infrastructure is pre- provisioned."

Why this is hard without pre-integration

Without the concepts/pre-integration-at-tier-creation lever, every onboarding triggers network-engineering work: VPC peering, PrivateLink setup, IAM role creation, cross-account trust relationships. This work is slow because:

  • Multiple teams are involved (network, security, downstream- service team, requesting team).
  • Approvals are gated on security review.
  • Integration testing requires the downstream-service owner's participation.
  • Documentation and operational handoff add days.

None of this work is tenant-specific; it's infrastructure provisioning that could be done once. The pre-integration pattern does exactly that, unlocking configuration-driven onboarding.

Composition with adjacent patterns

All four patterns together deliver the onboarding-speedup property; omitting any one of them re-introduces per-tenant infrastructure work.

Configuration artifacts

The tenant's onboarding configuration is small:

  • ECS task definition JSON (env vars, image, resources)
  • ALB listener rule (path / header pattern, priority)
  • Target group (name, protocol, port, health-check path)
  • ECS service definition (cluster, task count, autoscaling triggers)
  • CloudWatch alarms (memory 70/85, latency 2× baseline, 5XX rate)

All expressible as CloudFormation / CDK / Terraform. No code changes to the application layer (the application reads TENANT_ID from env and behaves accordingly).

When to use

  • Multi-tenant SaaS with tens to thousands of tenants where per-tenant onboarding cost is a business constraint.
  • Tenants with shared downstream-service topology — heterogeneous dependencies defeat the shared-endpoint property.
  • Stable, well-understood tier profiles — the tier has to be designed for all likely tenants before any tenant is onboarded.

When not to use

  • Customers require bespoke infrastructure — per-tenant downstream service integrations, per-tenant networking, per- tenant IAM policies. Each bespoke requirement re-introduces per-tenant provisioning.
  • Tier definitions are unstable — if the tier itself changes often, the amortisation doesn't compound.
  • Regulatory requirements force per-tenant boundariesconcepts/account-per-tenant-isolation is mandated.
  • Very small numbers of tenants (<10) — account-per-tenant onboarding cost amortises acceptably.

Anti-patterns

  • Calling onboarding "configuration-driven" while still running per-tenant provisioning scripts. The name is misleading if infrastructure is still being created per tenant.
  • Tenant-specific tier customisation via feature flags. Feature flags inside the tenant's task are fine; feature- flag-driven infrastructure (per-tenant target group count, per-tenant subnet sets) re-couples onboarding to infrastructure.
  • Per-tenant IAM roles disguised as configuration. Creating a new IAM role per tenant is still an IAM operation, not a configuration change.
  • Shipping onboarding as a manual runbook rather than automation. Even the remaining 7 days is improvable; manual runbooks don't compound.

Measured outcomes (AWS canonical)

From the post:

  • Tenant onboarding time: 52 days → 7 days (−86%)
  • Infrastructure setup steps per tenant: −80%
  • Engineering effort per onboarding: −80%
  • Feature release time: 2–3 days → 1 day
  • Tenant capacity: up to 100 tenants per AWS account

The feature release time reduction is a secondary dividend: once onboarding is configuration-driven, product-configuration changes flow through the same pipeline, enabling 1-day releases.

Caveats

  • The 7-day residual is "testing and validation." The post doesn't quantify how much is mandatory (customer testing, SLA compliance) vs improvable (automated smoke tests, pre-warmed caches).
  • Not all onboarding cost is engineering cost. Commercial (contract, legal, pricing), security review, and customer- side integration effort aren't in the 52 / 7 day numbers but still bound customer time-to-value.
  • Tier-creation cost is not amortised into the onboarding number. Tier creation itself can take weeks, but it happens once per tier, not per tenant.
  • Configuration mistakes can cause outages. Listener-rule priority collisions, target-group misconfig, ECS task- definition typos. The blast radius depends on how much of the configuration pipeline is automated vs manual.

Seen in

Last updated · 542 distilled / 1,571 read