SYSTEM Cited by 1 source

SageMaker HyperPod Inference Operator¶

SageMaker HyperPod Inference Operator is the Kubernetes controller that manages the deployment + lifecycle of inference models on an Amazon SageMaker HyperPod cluster with EKS orchestration. It reconciles two first-class CRDs — InferenceEndpointConfig (bring-your-own model from S3) and JumpStartModel (managed catalog) — into pods + services + load balancers + autoscalers on the HyperPod cluster.

CRDs¶

apiVersion: inference.sagemaker.aws.amazon.com/v1
kind: JumpStartModel              # or InferenceEndpointConfig

JumpStartModel — minimal shape (modelId + sageMakerEndpoint.name + server.instanceType). Managed-catalog path; the operator pulls weights, configures the server, owns lifecycle. Gated models require the JumpStart Gated Model IAM role.
InferenceEndpointConfig — BYO-model shape with modelName + replicas + instanceTypes: [...] (prioritised list — see concepts/instance-type-fallback) + explicit nodeAffinity + explicit worker.resources (cpu, memory, nvidia.com/gpu). Fully expressive Kubernetes-native shape; used when the managed catalog's opinions don't fit (AZ-pinning, Spot-exclusion, custom node labels).

The two-CRD split is a managed-vs- customer-owned data-plane boundary at the model-artifact layer — JumpStartModel = managed, InferenceEndpointConfig = customer- controlled with platform fallbacks available.

Packaging transition (2026)¶

Previously shipped as a Helm chart with substantial customer setup burden: manual IAM-role creation, S3 bucket for TLS certs, VPC endpoints, dependency-add-on install (cert-manager, S3 Mountpoint CSI, FSx CSI, metrics-server), dependency-version management, and Helm-release lifecycle. The 2026-04-06 announcement repackages it as a native EKS add-on (addon-name amazon-sagemaker-hyperpod-inference), with AWS owning:

IAM scaffolding (4 named roles: Execution Role, JumpStart Gated Model Role, ALB Controller Role, KEDA Operator Role).
S3 bucket for TLS certificates.
VPC endpoints for secure S3 access.
Dependency add-on provisioning, version-bumping, rollback.
Compatibility / version matrix against the EKS cluster version.

See patterns/eks-add-on-as-lifecycle-packaging for the general shape; this is the canonical wiki instance. The official migration script helm_to_addon.sh auto-discovers Helm config, scales down Helm deployments, installs the add-on with OVERWRITE, tags migrated AWS resources with CreatedBy: HyperPodInference, and preserves backups at /tmp/hyperpod-migration-backup-<timestamp>/ for rollback.

Platform features¶

Multi-instance-type deployment via Kubernetes node-affinity rules with descending weights — structural answer to GPU-capacity scarcity. The canonical example from the launch post: instanceTypes: ["ml.p4d.24xlarge", "ml.g5.24xlarge", "ml.g5.8xlarge"] compiles to a required-set restriction + preferred-weighted ordering; the scheduler silently falls back to the next-priority type when the preferred is unavailable.
Managed tiered KV cache — optional at install, with intelligent memory allocation per instance type. AWS-claimed up to 40% inference-latency reduction for long- context workloads (un-methodologied). KV cache lifecycle is now a platform concern, not a model-server-library concern.
Intelligent routing — three strategies (prefix-aware / KV-aware / round- robin) picked at install time to maximise cross-request cache reuse. Specialisation of concepts/workload-aware-routing for LLM inference.
KEDA-via-IRSA autoscaling bundled as a default dependency with its own IAM role.
Observability integration with TTFT / latency / GPU-utilisation dashboards.

Three install paths¶

SageMaker console (recommended) — Quick Install (all defaults) or Custom Install (specify existing IAM roles / buckets / add-ons).
EKS CLI — aws eks create-addon with a JSON configuration-values blob carrying executionRoleArn / tlsCertificateS3Bucket / hyperpodClusterArn / alb.serviceAccount / keda.auth.aws.irsa. Prerequisites (IAM roles, S3 bucket, VPC endpoints, dependency add-ons) must exist beforehand.
Terraform — the aws-samples/awsome-distributed-training module with create_hyperpod_inference_operator_module = true. Also installs the training-operator, task-governance, and observability add-ons alongside.

Seen in¶

sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod — the EKS-add-on launch announcement; sole source at time of writing. Contributes the operator's CRD shape + packaging transition + multi-instance-type node-affinity fallback + KV- cache-as-platform-feature + prefix-aware-routing story.

systems/aws-sagemaker-hyperpod — parent cluster substrate.
systems/aws-eks — the Kubernetes control plane.
systems/helm — the packaging primitive migrated away from.
systems/keda — bundled autoscaler dependency.
concepts/instance-type-fallback — the structural primitive.
concepts/kv-cache — the memory-hierarchy primitive under the managed cache.
concepts/prefix-aware-routing — the LLM-inference-request routing primitive.
patterns/eks-add-on-as-lifecycle-packaging — the packaging shift this operator canonicalises.