PATTERN Cited by 1 source
EKS add-on as lifecycle packaging¶
Problem¶
An AWS-native Kubernetes operator (or dependency controller) ships initially as a Helm chart. Customer installation requires:
- Creating several IAM roles with specific trust relationships (execution role, controller role, IRSA roles for sub- components).
- Creating supporting AWS resources (S3 buckets for TLS certs, VPC endpoints, KMS keys).
- Installing a bundle of dependency charts (cert-manager, CSI drivers, metrics-server, autoscalers) — each with its own version cadence, chart values, IRSA bindings.
- Managing Helm-release lifecycle:
helm upgradecadence, break- glass rollbacks, drift between declared and actual state. - Tracking a compatibility matrix across {EKS version × operator version × dependency-chart versions}.
The result: "a maze of Helm charts, IAM role configurations, dependency management, and manual upgrades — often taking hours before a single model can serve predictions" (sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod).
The customer-operated surface is mostly undifferentiated heavy lifting — none of the Helm/IAM/dependency scaffolding is where the customer's value-add lives.
Pattern shape¶
Re-package the Helm chart + its dependency bundle + its IAM prerequisites as a native EKS add-on.
An EKS add-on is a first-class EKS API primitive (aws eks
create-addon, aws eks update-addon, aws eks delete-addon)
where AWS owns:
- Version compatibility — the add-on carries a declared {EKS version × add-on version} compatibility envelope; the EKS API refuses incompatible combinations.
- One-click upgrades with rollback-on-failure semantics —
update-addonagainst a new version, EKS handles the reconciliation + reverts on failure. - Dependency add-ons — the add-on manifest declares its dependencies as other add-ons (cert-manager-as-add-on, CSI- driver-as-add-on) that EKS installs / upgrades together.
- IAM scaffolding — the add-on install creates named IAM
roles with scoped trust / permissions via the
configuration- valuesblob (executionRoleArn, IRSA role ARNs for sub- components). - Config blob — a JSON
configuration-valuesschema for customer-specific knobs (S3 bucket names, cluster ARN, existing IAM role ARNs to reuse).
The customer-facing surface collapses from "install Helm chart
+ 4 dependency charts + create 4 IAM roles + create S3 bucket +
configure VPC endpoints" to a single aws eks create-addon
command.
Canonical realisation: SageMaker HyperPod Inference Operator¶
The operator's 2026-04-06 repackaging is the canonical wiki instance. Before:
- Helm chart
HyperPodHelmChart/charts/inference-operator⋅ - Helm sub-charts for cert-manager, S3 Mountpoint CSI driver, FSx CSI driver, metrics-server ⋅
- Customer-created IAM roles: Execution / JumpStart Gated Model / ALB Controller / KEDA Operator (four) ⋅
- Customer-created S3 bucket for TLS certificates ⋅
- Customer-created VPC endpoints for S3 access ⋅
- Customer-managed upgrade cadence.
After:
aws eks create-addon \
--cluster-name my-hyperpod-cluster \
--addon-name amazon-sagemaker-hyperpod-inference \
--addon-version v1.0.0-eksbuild.1 \
--configuration-values '{
"executionRoleArn": "...",
"tlsCertificateS3Bucket": "...",
"hyperpodClusterArn": "...",
"alb": { "serviceAccount": {"create": true, "roleArn": "..."}},
"keda": { "auth": { "aws": { "irsa": { "roleArn": "..."}}}}
}' \
--region us-west-2
Or, via the SageMaker console's Quick Install, zero customer parameters — AWS creates IAM roles / S3 bucket / VPC endpoints / dependency add-ons with optimised defaults, one click.
The migration-script sub-pattern¶
For workloads already running on the Helm chart, shipping the
add-on isn't enough — an automated migration is required.
SageMaker HyperPod's migration script
(helm_to_addon.sh)
is the canonical shape:
- Auto-discover — read the existing Helm deployment's
configuration (IAM roles in use, S3 buckets, dependency
charts, release values) and derive the new add-on's
configuration-valuesblob. - Validate prerequisites — confirm dependency CRDs / RBAC exist; create whatever the Helm release used that the add-on also needs.
- Tag migrated resources —
CreatedBy: HyperPodInferenceapplied to ALBs, ACM certs, S3 objects so both paths can coexist during cutover and so cleanup is identifiable. - Scale down Helm-managed deployments (ALB, KEDA, operator) before installing the add-on to avoid two controllers reconciling the same CRDs.
- Install the add-on with
OVERWRITEflag so it owns the CRDs / namespace resources previously owned by Helm. - Clean up old Helm resources.
- Migrate dependency add-ons (CSI drivers, cert-manager,
metrics-server) from Helm-installed to add-on-installed;
--skip-dependencies-migrationflag lets organisations that already operate these as their own add-ons opt out. - Preserve rollback — store
/tmp/hyperpod-migration-backup- <timestamp>/so failure mid-migration can revert to the Helm state.
The combination of --auto-approve (non-interactive path for
automation) + stepwise-interactive path + rollback backups gives
operators the three migration postures: run it in prod now,
walk through it on the pre-prod cluster, and undo it if the
add-on misbehaves.
Why the pattern works¶
- Fewer knobs → fewer foot-guns. The Helm surface had ~N independent knobs (chart values × dependency-chart versions × IAM-role policy JSON × S3 ACL × VPC endpoint config); the add-on surface has one JSON config blob with vendor-validated shapes.
- Vendor owns the compatibility matrix. The {EKS version × operator version × dependency-add-on versions} matrix is a combinatorial explosion customers can't keep in their head; making it the vendor's problem is a direct application of concepts/managed-data-plane-style responsibility-boundary shift.
- Upgrade path is one API call.
update-addonwith a new version is safer thanhelm upgradebecause EKS validates the jump is on the supported matrix and rolls back on failure. - Dependency add-ons compose uniformly. When cert-manager ships as an add-on, every operator that depends on it stops having to bundle its own version; the EKS cluster has one cert-manager, not N.
Costs / when the pattern is wrong¶
- Customisation surface narrows. Helm's templating lets
customers inject arbitrary annotations, labels, init-
containers, volume mounts; the add-on's
configuration-valuesschema is vendor-defined. If the customer's deployment requires non-standard tweaks, the add-on path is too rigid — fall back to Helm. - Vendor-controlled upgrade cadence. Once the add-on is the only shipping channel, the customer is on the vendor's upgrade clock. For some workloads (pinned compliance versions, custom forks), this is disqualifying.
- Multi-cloud / portability. Helm charts are cloud-agnostic; EKS add-ons only work on EKS. Organisations that deploy to EKS + GKE + AKS can't share an add-on path.
- Less ecosystem leverage. The Helm chart ecosystem has well-understood tooling (helmfile, ArgoCD Helm renderer, CI/CD integrations); add-on equivalents are EKS-specific and less mature.
Adjacent tradition¶
- OS-level package managers (apt, yum) for system add-ons
vs.
./configure && make install— the same packaging- ownership shift at the Linux-distro layer. - Managed-RDS extensions — Postgres extensions installed via
CREATE EXTENSION(managed) vs. custom.sofiles on a self- hosted server (customer-managed). - concepts/managed-data-plane at the service-mesh layer — AWS-managed Envoy (Service Connect) vs. customer-managed Envoy (App Mesh). Same trade-off: lose configurability, gain lifecycle management.
- EKS Auto Mode at the node / data-plane layer. EKS add-on packaging is the operator-layer instance of the same shared-responsibility shift Auto Mode made at the node layer.
Seen in¶
- sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod
— canonical instance: SageMaker HyperPod Inference Operator
Helm → EKS add-on migration, with explicit dependency-add-on
bundling (cert-manager, S3 CSI, FSx CSI, metrics-server),
four-role IAM scaffolding, and the
helm_to_addon.shmigration-script pattern.
Related¶
- systems/aws-eks — the platform primitive.
- systems/helm — the packaging primitive being migrated from.
- systems/sagemaker-hyperpod-inference-operator — canonical consumer.
- concepts/managed-data-plane — the broader shared- responsibility shift this pattern instances.
- concepts/shared-responsibility-model — where the add-on boundary lives on the AWS/customer axis.