Skip to content

PATTERN Cited by 1 source

Shared host cluster with virtual clusters

Pattern

Run one physical Kubernetes cluster (e.g. EKS with Auto Mode as the node provisioner) as the host cluster, and partition it into many lightweight virtual Kubernetes clusters — each with its own API server, RBAC, namespaces, DNS — running as pods inside the host. Essential platform services (ingress controller, CSI drivers, monitoring agents) deploy once on the host and are shared across all virtual clusters via the vCluster syncer.

Problem it solves

Provisioning a dedicated Kubernetes cluster per testing need (per team, per feature branch, per QA scenario) is expensive in three dimensions:

  • Time — control-plane provisioning + ingress + DNS + monitoring setup is 30–45 min for a fresh EKS cluster (Deloitte).
  • Resources — every dedicated cluster duplicates its own copy of every controller, ingress, monitoring agent, and operator. At 50+ environments that's >50× duplication of infrastructure that does not vary per tenant.
  • Human time — each dedicated cluster is a ticket to the platform team, creating the platform-team bottleneck.

A single shared cluster with no tenant partitioning is the other extreme — fast to provision namespaces, but no API-surface isolation, and noisy-neighbour + CRD-collision risk.

The shared-host-cluster-with-virtual-clusters pattern sits between these extremes: one expensive physical cluster + many cheap virtual clusters, each virtual cluster retaining API-surface isolation (namespaces, RBAC, DNS, CRDs) without duplicating compute-plane infrastructure.

Structure

┌──────────────────────────────────────────────────────────┐
│             Host EKS cluster (EKS Auto Mode)              │
│  ┌───────────────────────────────────────────────────┐   │
│  │ Shared platform services                          │   │
│  │  - AWS Load Balancer Controller (one)             │   │
│  │  - Storage controllers (EBS CSI / EFS CSI)        │   │
│  │  - Monitoring agents (one)                        │   │
│  │  - vcluster-platform controller                   │   │
│  └───────────────────────────────────────────────────┘   │
│                                                           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │  vcluster 1  │  │  vcluster 2  │  │  vcluster N  │   │
│  │  (API server │  │  (API server │  │  (API server │   │
│  │   + CM +     │  │   + CM +     │  │   + CM +     │   │
│  │   CoreDNS)   │  │   CoreDNS)   │  │   CoreDNS)   │   │
│  └──────────────┘  └──────────────┘  └──────────────┘   │
│                                                           │
│         Workload pods run on host nodes via syncer        │
└──────────────────────────────────────────────────────────┘

Traffic flows:

How to implement (Deloitte-style)

  1. Provision the host cluster. EKS cluster with Auto Mode on, Service IPv4 range set to 10.96.0.0/12 (vcluster's default service CIDR).
  2. Install shared controllers on the host. AWS Load Balancer Controller + EBS / EFS CSI drivers + monitoring agents. These run once and are the load-bearing "deploy once, share everywhere" substrate.
  3. Install vcluster-platform. Helm chart vcluster/vcluster-platform (v4.0.1 in Deloitte's post). This is the self-service portal that QA engineers use — deployed behind an ALB with ACM-issued TLS.
  4. Configure the IngressClassParams + IngressClass + StorageClass for ALB + EBS on the host. Using group.name: vcluster ensures multiple vcluster ingresses can share a single ALB.
  5. For each tenant / environment, create a vcluster via the platform UI or CLI. Each gets its own API server + controller manager + embedded CoreDNS running as host-cluster pods. Per-vcluster sync config exposes host IngressClasses / StorageClasses up (sync.fromHost) and mirrors vcluster Ingresses down to the host (sync.toHost).
  6. QA teams deploy their applications into their own vcluster via the kubeconfig downloaded from the platform UI. Their Ingress objects get materialised into ALB rules by the shared host-side AWS LB Controller.

Trade-offs

Wins:

  • Environment provisioning drops from minutes-to-hours to minutes or less (Deloitte: 45 min → <5 min, 89% reduction).
  • Large compute + memory savings from non-duplicated controllers (Deloitte: >50 vCPU + >200 GB RAM saved at peak across the fleet).
  • Self-service becomes tractable — the platform team can expose a policy-gated vcluster-creation API and get out of the per- environment critical path.
  • Tooling simplification — one ALB, one set of controllers, one monitoring stack to operate, not N of each.

Costs / risks:

  • Shared kernel across all virtual clusters on the same host node — weaker isolation than dedicated clusters. Appropriate for QA / pre-prod; inappropriate for hard-multi-tenant production.
  • Host-cluster SPoF — host failure takes down all virtual clusters simultaneously.
  • Density ceiling — at some scale the host runs out of node capacity / etcd throughput. Deloitte's 50+ vclusters on one host is a useful reference upper bound for QA-style use.
  • Noisy-neighbour risk — without explicit ResourceQuota + LimitRange on the host (scoped to each vcluster's host namespace), one vcluster's workloads can starve siblings.
  • Network-policy homogeneity — all virtual clusters inherit the host's CNI and network-policy stack; you can't express "vcluster-A uses Calico, vcluster-B uses Cilium".
  • vCluster Platform is commercial — Deloitte uses the paid vcluster/vcluster-platform Helm chart. Open-source vcluster (the CLI) is sufficient for the core primitive but lacks the self-service UI the case study highlights.

When to use

  • QA / pre-production testing environments with 10+ concurrent environments and a platform-team bottleneck today.
  • Per-feature-branch ephemeral clusters where each branch wants its own K8s surface but the cost of a real cluster is prohibitive.
  • Multi-tenant CI/CD runners where jobs need semantic cluster isolation without the provisioning cost.

When not to use

  • Hard-multi-tenant production with mutually-distrusting tenants — use dedicated clusters or microVM pod sandboxing.
  • Compliance-mandated per-tenant data isolation with kernel- level requirements (PCI / HIPAA in some interpretations).
  • When individual virtual clusters need different CNI / network- policy stacks — vcluster inherits the host's.

Seen in

Last updated · 427 distilled / 1,229 read