SYSTEM Cited by 1 source
SageMaker HyperPod Inference Operator¶
SageMaker HyperPod Inference Operator is the Kubernetes
controller that manages the deployment + lifecycle of inference
models on an Amazon SageMaker
HyperPod cluster with EKS orchestration. It
reconciles two first-class CRDs — InferenceEndpointConfig
(bring-your-own model from S3) and JumpStartModel (managed
catalog) — into pods + services + load balancers + autoscalers on
the HyperPod cluster.
CRDs¶
JumpStartModel— minimal shape (modelId+sageMakerEndpoint.name+server.instanceType). Managed-catalog path; the operator pulls weights, configures the server, owns lifecycle. Gated models require the JumpStart Gated Model IAM role.InferenceEndpointConfig— BYO-model shape withmodelName+replicas+instanceTypes: [...](prioritised list — see concepts/instance-type-fallback) + explicitnodeAffinity+ explicitworker.resources(cpu,memory,nvidia.com/gpu). Fully expressive Kubernetes-native shape; used when the managed catalog's opinions don't fit (AZ-pinning, Spot-exclusion, custom node labels).
The two-CRD split is a managed-vs-
customer-owned data-plane boundary at the model-artifact layer —
JumpStartModel = managed, InferenceEndpointConfig = customer-
controlled with platform fallbacks available.
Packaging transition (2026)¶
Previously shipped as a Helm chart with substantial customer setup
burden: manual IAM-role creation, S3 bucket for TLS certs, VPC
endpoints, dependency-add-on install (cert-manager, S3 Mountpoint
CSI, FSx CSI, metrics-server), dependency-version management, and
Helm-release lifecycle. The 2026-04-06 announcement repackages it as
a native EKS add-on (addon-name
amazon-sagemaker-hyperpod-inference), with AWS owning:
- IAM scaffolding (4 named roles: Execution Role, JumpStart Gated Model Role, ALB Controller Role, KEDA Operator Role).
- S3 bucket for TLS certificates.
- VPC endpoints for secure S3 access.
- Dependency add-on provisioning, version-bumping, rollback.
- Compatibility / version matrix against the EKS cluster version.
See patterns/eks-add-on-as-lifecycle-packaging for the general
shape; this is the canonical wiki instance. The official migration
script helm_to_addon.sh
auto-discovers Helm config, scales down Helm deployments, installs
the add-on with OVERWRITE, tags migrated AWS resources with
CreatedBy: HyperPodInference, and preserves backups at
/tmp/hyperpod-migration-backup-<timestamp>/ for rollback.
Platform features¶
- Multi-instance-type
deployment via Kubernetes node-affinity rules with descending
weights — structural answer to GPU-capacity scarcity. The
canonical example from the launch post:
instanceTypes: ["ml.p4d.24xlarge", "ml.g5.24xlarge", "ml.g5.8xlarge"]compiles to arequired-set restriction +preferred-weighted ordering; the scheduler silently falls back to the next-priority type when the preferred is unavailable. - Managed tiered KV cache — optional at install, with intelligent memory allocation per instance type. AWS-claimed up to 40% inference-latency reduction for long- context workloads (un-methodologied). KV cache lifecycle is now a platform concern, not a model-server-library concern.
- Intelligent routing — three strategies (prefix-aware / KV-aware / round- robin) picked at install time to maximise cross-request cache reuse. Specialisation of concepts/workload-aware-routing for LLM inference.
- KEDA-via-IRSA autoscaling bundled as a default dependency with its own IAM role.
- Observability integration with TTFT / latency / GPU-utilisation dashboards.
Three install paths¶
- SageMaker console (recommended) — Quick Install (all defaults) or Custom Install (specify existing IAM roles / buckets / add-ons).
- EKS CLI —
aws eks create-addonwith a JSONconfiguration-valuesblob carryingexecutionRoleArn/tlsCertificateS3Bucket/hyperpodClusterArn/alb.serviceAccount/keda.auth.aws.irsa. Prerequisites (IAM roles, S3 bucket, VPC endpoints, dependency add-ons) must exist beforehand. - Terraform — the
aws-samples/awsome-distributed-trainingmodule withcreate_hyperpod_inference_operator_module = true. Also installs the training-operator, task-governance, and observability add-ons alongside.
Seen in¶
- sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod — the EKS-add-on launch announcement; sole source at time of writing. Contributes the operator's CRD shape + packaging transition + multi-instance-type node-affinity fallback + KV- cache-as-platform-feature + prefix-aware-routing story.
Related¶
- systems/aws-sagemaker-hyperpod — parent cluster substrate.
- systems/aws-eks — the Kubernetes control plane.
- systems/helm — the packaging primitive migrated away from.
- systems/keda — bundled autoscaler dependency.
- concepts/instance-type-fallback — the structural primitive.
- concepts/kv-cache — the memory-hierarchy primitive under the managed cache.
- concepts/prefix-aware-routing — the LLM-inference-request routing primitive.
- patterns/eks-add-on-as-lifecycle-packaging — the packaging shift this operator canonicalises.