PATTERN Cited by 1 source
vCluster fast test-environment provisioning¶
Pattern¶
Expose vCluster virtual-cluster creation as a self-service operation to QA / test engineers so they can provision isolated Kubernetes environments on demand, in under 5 minutes, without filing a ticket to the platform team. The substrate is the shared-host- cluster-with-virtual-clusters topology; this pattern is the operational layer on top that delivers the self-service property.
Problem it solves¶
Testing environments have a paradoxical requirement profile:
- Must be isolated — two QA engineers testing conflicting feature branches cannot share an environment.
- Must be ephemeral — created for a specific test scenario, destroyed when done. Long-lived test environments drift and accumulate state.
- Must be fast to create — testing throughput drops if engineers wait minutes-to-hours for an environment.
- Must not require platform-team involvement per creation — otherwise the platform team becomes the bottleneck.
- Must be cheap — 50+ concurrent testing environments is a lot of dedicated infrastructure.
Dedicated-cluster-per-environment fails at fast and cheap. Shared-cluster-with-namespace-isolation fails at isolated (no CRD / RBAC separation, shared API server).
Structure¶
- Platform team provisions and operates the shared host cluster once. They install and maintain the shared controllers (ALB, CSI, monitoring), the vcluster-platform self-service portal, policy schema (limits, naming, tags), and the observability / cost-attribution stack.
- QA engineers log into the vcluster-platform UI (or CLI) with their org credentials.
- They select "Deploy with vCluster platform (default)" and specify the per-vcluster YAML config (sync rules, CoreDNS, affinity). Deloitte's canonical config synchronises host IngressClasses + StorageClasses up and vcluster Ingresses down.
- The platform creates a new vcluster — API server, CM, CoreDNS pods get scheduled on the host cluster, virtual cluster becomes Ready in <5 min.
- QA engineer downloads the kubeconfig from the UI and
kubectl applys their test workloads. Applications become reachable via the shared ALB's path-based routing (see patterns/shared-alb-path-based-multi-cluster-routing). - When the test is done, the vcluster is deleted — API-server pods and dedicated host namespace are cleaned up.
Canonical outcomes (Deloitte)¶
- Environment provisioning time: 30–45 min → <5 min (89% reduction).
- Engineer-hours reclaimed: ~500 / year for the QA team.
- Concurrent environments: 50+ vclusters running on one host.
- Tool-deployment reduction: 10+ per-environment tool deployments collapsed into one shared stack.
- Cost savings: >50 vCPU + >200 GB RAM saved at peak from non-duplicated controllers; up to 70% additional savings from EC2 Spot via EKS Auto Mode.
Key implementation details¶
- Sub-5-minute SLO for vcluster Ready is only achievable because vcluster creation is pod-scheduling on an existing cluster — not new EC2 provisioning. Making this reliable requires the host cluster to have headroom (real node capacity available at vcluster-creation time); EKS Auto Mode
- EC2 Spot is Deloitte's solution.
- Policy in schema, not in humans — the vcluster creation API must encode the platform team's policy (resource limits, naming conventions, allowed sync config, tagging) so that the platform team exits the critical path. If every vcluster creation still needs platform-team review, the bottleneck returns.
- Per-vcluster kubeconfig via OIDC — vcluster-platform uses the user's org identity to scope the downloaded kubeconfig, eliminating shared-credential management.
- Explicit cleanup — long-lived vclusters accumulate cost. Automated TTL (delete vclusters after N days of inactivity) or branch-linked lifecycle (delete vcluster when PR is merged) is recommended; the post doesn't specify Deloitte's approach.
Trade-offs¶
Wins:
- Eliminates the ticket-to-platform-team serialisation point for test-environment creation.
- Frees QA engineers from setup tasks ("shifting focus from repetitive setup tasks to higher-value testing work").
- Platform team shifts from per-environment ticket processing to operating the shared substrate + evolving the self-service contract.
Costs / risks:
- Self-service portal cost — vcluster-platform is commercial; open-source vcluster (CLI) is free but lacks the UI.
- Host-cluster headroom — the platform team must keep the host cluster with capacity to absorb new vcluster creations; this may mean paying for idle capacity during low-usage windows. Spot Instances + Auto Mode mitigate but don't eliminate.
- Policy-schema-expressiveness trap — if the platform team tries to express too many constraints in schema (arbitrarily complex cost allocation, cross-resource dependencies), the schema becomes a second programming language that product teams can't reason about. Simpler policies → faster self-service adoption.
- Cleanup discipline — without explicit TTL, long-lived vclusters accumulate and defeat the cost savings. The pattern assumes disciplined lifecycle.
- Still QA / pre-prod only — shared-kernel isolation remains the ceiling on what workloads should use this pattern.
Seen in¶
- sources/2026-04-27-aws-deloitte-optimizes-eks-environment-provisioning-with-vcluster
— canonical case. Deloitte's QA team uses the
vcluster/vcluster-platformv4.0.1 Helm-installed portal to self-provision 50+ virtual clusters on one shared EKS host cluster. Sub-5-minute time-to- Ready; QA engineers no longer file tickets to the platform team. Verbatim: "Teams can now provision their own testing environments in under 5 minutes without platform team involvement, compared to submitting requests and waiting 30-45 minutes previously." The canonical data point for the 500-h / year engineer-time reclamation attributable to the self-service property.
Related¶
- systems/vcluster — the substrate
- systems/aws-eks — the host-cluster choice
- concepts/virtual-kubernetes-cluster — the primitive
- concepts/platform-team-bottleneck — the org problem
- concepts/self-service-infrastructure — the property delivered
- patterns/shared-host-cluster-with-virtual-clusters — the required topology
- patterns/per-pr-ephemeral-environment — sibling pattern at namespace altitude (lighter weight, less isolation)