Skip to content

Deloitte optimizes EKS environment provisioning and achieves 89% faster testing environments using Amazon EKS and vCluster

Summary

AWS Architecture Blog customer case study (2026-04-27) in which Deloitte — a global professional-services organisation — describes how they eliminated a 30–45-minute-per-cluster provisioning bottleneck for QA testing environments by replacing one-dedicated-Amazon-EKS-cluster-per-testing-need with one shared EKS host cluster running EKS Auto Mode, partitioned into 50+ lightweight vCluster virtual Kubernetes clusters. Each virtual cluster acts like an independent Kubernetes environment (its own API server, control plane, DNS) but reuses the host cluster's compute, networking, storage-controller, load-balancer-controller, and monitoring-agent infrastructure. Environment provisioning dropped from 45 min to <5 min (89% reduction), the QA team reclaimed ~500 engineer-hours per year, and resource consolidation saved >50 vCPU + >200 GB RAM at peak, with up to 70% additional savings from EC2 Spot. A single internet-facing Application Load Balancer with ACM-terminated HTTPS fronts the whole fleet, routing path-based rules (/<app-name>) to applications running in distinct virtual clusters — collapsing what used to be N ALBs + N Route 53 records + N ingress controllers + N monitoring stacks into one shared stack. Post is a reference-architecture-style customer case study with a full hands-on walkthrough (Helm chart for vcluster-pro, IngressClassParams, per-vcluster YAML for host-to-vcluster sync); substance is in the multi-tenancy topology + the self-service handoff + the quantified platform-team-bottleneck removal, rather than in operational runtime incidents.

Key takeaways

  1. Per-testing-need dedicated EKS clusters is the anti-pattern. Deloitte's before-state: "Deloitte provisioned dedicated Amazon EKS clusters on AWS for each ephemeral testing need. This approach could take up to 45 minutes per cluster." The 45 min isn't just the EKS control plane — it's the EKS cluster + ALB + Route 53 records + ingress controllers + DNS setup + monitoring agents, all of which have to be stood up fresh per environment. Each dedicated cluster then carries its own control-plane cost, its own set of controllers, and its own IAM + RBAC configuration. (Source: this post)

  2. vCluster is a virtual Kubernetes cluster running inside another Kubernetes cluster. Deloitte's architecture: one EKS host cluster (the "physical" cluster, with real EC2 nodes and the AWS EKS control plane), and multiple vCluster virtual clusters each composed of a dedicated K8s API server, controller-manager, and data-store running as pods inside the host cluster. Users of a virtual cluster see a full independent Kubernetes environment (kubectl get nodes shows virtual nodes, not host nodes); the virtual cluster handles its own namespaces, RBAC, CRDs. But the virtual cluster's workload pods actually run on the host cluster's real nodes — syncing up through the vCluster syncer. This is the core of the virtual Kubernetes cluster primitive: control-plane-per-tenant, shared data plane. (Source: this post)

  3. Essential platform services deploy once on the host, shared across all virtual clusters. Deloitte explicitly calls this out: "Essential platform services such as Kubernetes controllers and monitoring agents are deployed once on the host cluster and shared across all virtual clusters. This approach reduces resource duplication and streamlines management." Concretely: the ALB controller (via the AWS Load Balancer Controller on the host), the storage controller (managing EBS and EFS volumes), and the monitoring agents all run once on the host; each virtual cluster's ingress objects get synced up to the host where the ALB controller materialises them as ALB rules. (Source: this post)

  4. Environment provisioning time dropped from 45 min to <5 min — an 89% reduction. "Environment provisioning time dropped from 45 minutes to under 5 minutes, representing an 89% reduction that translates to immediate productivity gains." This is the load- bearing quantified outcome of the pattern — not an incremental optimisation, a step-function change. The provisioning work that used to require ticket-to-platform-team → cluster-provision → controllers-install → ingress-setup → DNS-register → validate now collapses to "create a vcluster resource in the host cluster and wait for it to become Ready". See patterns/vcluster-fast-test-environment-provisioning. (Source: this post)

  5. ~500 engineer-hours reclaimed annually + resource consolidation saves >50 vCPU and >200 GB RAM at peak. "The QA team has reclaimed around 500 hours annually, shifting focus from repetitive setup tasks to higher-value testing work. […] Deloitte saves over 50 vCPUs and more than 200 GB of memory at peak usage." The 500 h/yr is the people-side benefit of self-service eliminating the platform-team bottleneck; the compute savings are the direct consequence of removing duplicated controllers, ingress, and monitoring stacks that would otherwise run once per dedicated cluster. (Source: this post)

  6. 50+ virtual clusters on a single shared EKS host cluster. "Deloitte now runs more than 50 virtual clusters efficiently on a single shared Amazon EKS host cluster." Combined with EKS Auto Mode's dynamic autoscaling and EC2 Spot Instances, this is the concrete density number: one-physical-cluster-per-fifty-logical-environments at peak, with the host cluster scaling up its real node capacity on demand. "Cost optimization improved further, with up to 70% savings by running workloads on Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances, with Amazon EKS Auto Mode providing efficient, automated autoscaling and provisioning." (Source: this post)

  7. Single ALB with path-based routing serves applications across all virtual clusters. "The architecture was further streamlined by implementing a single load balancer capable of serving traffic to applications across multiple virtual clusters." Each application running in a distinct virtual cluster registers its Ingress with alb.ingress.kubernetes.io/group.order metadata; the AWS Load Balancer Controller aggregates these into a single ALB listener with per-path rules (/app1 → app1's service in vcluster-1, /app2 → app2's service in vcluster-2). The vcluster sync: toHost: ingresses: enabled: true config is what lets virtual-cluster Ingress objects reach the host-cluster ALB controller. See patterns/shared-alb-path-based-multi-cluster-routing. (Source: this post)

  8. Tooling consolidation: from 10+ tools per dedicated cluster to one shared stack. "Instead of managing more than ten separate tool deployments (reverse proxy, monitoring agents, controllers, etc.), teams now rely on a single shared stack that's easier to maintain and operate." The operational surface reduction — 10× fewer ingress controllers, 10× fewer monitoring agents, 10× fewer ALB controllers, 10× fewer RBAC stacks to configure per environment — is the hidden second-order benefit of the consolidated topology; provisioning speed is the first-order win, but maintenance simplification is what makes the 50+ vcluster density sustainable. (Source: this post)

  9. Self-service: QA teams provision their own virtual clusters in under 5 min without platform-team involvement. "Teams can now provision their own testing environments in under 5 minutes without platform team involvement, compared to submitting requests and waiting 30-45 minutes previously." This is the self-service-infrastructure property applied at the test-environment altitude — QA engineers exercise the platform's vcluster-creation contract on demand, eliminating the platform-team queue as a serialisation point. The vCluster platform UI is what mediates the self-service interaction ("select and connect to the virtual cluster" → download kubeconfig → run kubectl apply). (Source: this post)

Architecture diagram walkthrough

From the post's Figure 1:

  1. Users reach applications via HTTPS over the public internet.
  2. HTTPS terminates at the Application Load Balancer, which uses an ACM certificate for TLS termination.
  3. ALB routes path-based rules (/app1, /app2, …) to the appropriate application.
  4. Each application runs in its own virtual cluster with dedicated Amazon EBS storage for persistent state.

The host cluster sits beneath all of this: real EC2 nodes (provisioned by EKS Auto Mode + EC2 Spot), the AWS Load Balancer Controller, the EBS CSI driver, the vcluster control-plane pods for each of the 50+ virtual clusters, and the workload pods that those virtual clusters schedule onto the host's real nodes via the vCluster syncer.

Operational numbers

  • 45 min → <5 min: environment provisioning time (89% reduction).
  • ~500 hours / year: QA engineer time reclaimed from setup tasks.
  • >50 vCPU and >200 GB RAM saved at peak: from resource consolidation across shared controllers + ingress + monitoring.
  • Up to 70% savings: from EC2 Spot Instances via EKS Auto Mode.
  • 50+ virtual clusters: running on one shared EKS host cluster.
  • >10 separate tool deployments → one shared stack: per- environment tooling reduction.
  • vcluster-platform version: 4.0.1 (Helm chart vcluster/vcluster-platform).
  • Service IPv4 range: 10.96.0.0/12 (required vcluster service CIDR).

Walkthrough key config fragments

The post includes a hands-on deployment walkthrough that canonicalises several load-bearing config patterns:

IngressClassParams for ALB grouping — the group.name: vcluster config is what lets multiple vcluster ingresses share the same ALB:

apiVersion: eks.amazonaws.com/v1
kind: IngressClassParams
metadata:
  name: alb
spec:
  scheme: internet-facing
  group:
    name: vcluster

Per-vcluster sync config — controls which host-cluster resources are mirrored into the virtual cluster and which vcluster resources sync up to the host. The host-to-vcluster direction exposes the shared ingressClasses and storageClasses so that the virtual cluster's users see them as if they were native; the vcluster-to-host direction sends ingresses up so the host's ALB controller can materialise them:

sync:
  fromHost:
    ingressClasses:
      enabled: true
    storageClasses:
      enabled: true
  toHost:
    ingresses:
      enabled: true
controlPlane:
  coredns:
    enabled: true
    embedded: true

Application ingress with shared ALB — each app in each vcluster declares alb.ingress.kubernetes.io/load-balancer-name: vcluster-alb (the same name across all vclusters) + group.order to control rule precedence. The AWS LB Controller on the host cluster aggregates all matching Ingresses into the one vcluster-alb ALB.

Caveats

  • Customer-case-study framing. This is a reference-architecture post by AWS featuring Deloitte; substance is in the topology + quantified reductions, not in operational incidents, tail-latency numbers, or partial-failure retrospectives. No p99 / RPS / error- rate figures are published.
  • vCluster platform is a commercial product (Loft Labs). The post uses vcluster/vcluster-platform — the paid, enterprise edition of vCluster — which adds the management UI, SSO, and multi-tenancy features beyond what the open-source vcluster CLI offers. Open-source vcluster is itself a sufficient primitive for the core virtual-Kubernetes-cluster mechanics; the platform edition adds the self-service UI Deloitte uses to empower QA teams. The post discloses a "13-day vCluster platform trial" but does not discuss pricing.
  • QA / pre-production context. All 50+ virtual clusters are for QA testing. The post does not claim this topology is appropriate for production workloads — vcluster's shared host-kernel isolation is weaker than per-cluster isolation (a malicious or buggy workload in one vcluster can exhaust host-cluster resources affecting sibling vclusters). Production-grade tenant isolation typically still requires dedicated clusters or namespace-scoped ResourceQuotas + NetworkPolicies + PodSecurityAdmission, none of which the post discusses.
  • Single-ALB SPoF. The single-ALB architecture is deliberate (cost + ops simplification) but means one ALB outage affects all 50+ virtual clusters' applications. The post does not discuss multi-ALB failover or per-environment isolation at the network layer.
  • No disaster-recovery framing. The post does not address what happens if the host cluster fails — all 50+ virtual clusters go with it. For pre-prod QA this is acceptable; for production it would not be.
  • Weekly AMI cadence inheritance. Because the host is EKS Auto Mode, it inherits Auto Mode's weekly Bottlerocket-replacement cadence. The post does not discuss how node churn on the host affects the 50+ vcluster control planes or their workloads; presumably PDBs on the vcluster control-plane pods mitigate, but this is not spelled out.

Source

Last updated · 427 distilled / 1,229 read