Skip to content

SYSTEM Cited by 1 source

SageMaker HyperPod Inference Operator

SageMaker HyperPod Inference Operator is the Kubernetes controller that manages the deployment + lifecycle of inference models on an Amazon SageMaker HyperPod cluster with EKS orchestration. It reconciles two first-class CRDs — InferenceEndpointConfig (bring-your-own model from S3) and JumpStartModel (managed catalog) — into pods + services + load balancers + autoscalers on the HyperPod cluster.

CRDs

apiVersion: inference.sagemaker.aws.amazon.com/v1
kind: JumpStartModel              # or InferenceEndpointConfig
  • JumpStartModel — minimal shape (modelId + sageMakerEndpoint.name + server.instanceType). Managed-catalog path; the operator pulls weights, configures the server, owns lifecycle. Gated models require the JumpStart Gated Model IAM role.
  • InferenceEndpointConfig — BYO-model shape with modelName + replicas + instanceTypes: [...] (prioritised list — see concepts/instance-type-fallback) + explicit nodeAffinity + explicit worker.resources (cpu, memory, nvidia.com/gpu). Fully expressive Kubernetes-native shape; used when the managed catalog's opinions don't fit (AZ-pinning, Spot-exclusion, custom node labels).

The two-CRD split is a managed-vs- customer-owned data-plane boundary at the model-artifact layer — JumpStartModel = managed, InferenceEndpointConfig = customer- controlled with platform fallbacks available.

Packaging transition (2026)

Previously shipped as a Helm chart with substantial customer setup burden: manual IAM-role creation, S3 bucket for TLS certs, VPC endpoints, dependency-add-on install (cert-manager, S3 Mountpoint CSI, FSx CSI, metrics-server), dependency-version management, and Helm-release lifecycle. The 2026-04-06 announcement repackages it as a native EKS add-on (addon-name amazon-sagemaker-hyperpod-inference), with AWS owning:

  • IAM scaffolding (4 named roles: Execution Role, JumpStart Gated Model Role, ALB Controller Role, KEDA Operator Role).
  • S3 bucket for TLS certificates.
  • VPC endpoints for secure S3 access.
  • Dependency add-on provisioning, version-bumping, rollback.
  • Compatibility / version matrix against the EKS cluster version.

See patterns/eks-add-on-as-lifecycle-packaging for the general shape; this is the canonical wiki instance. The official migration script helm_to_addon.sh auto-discovers Helm config, scales down Helm deployments, installs the add-on with OVERWRITE, tags migrated AWS resources with CreatedBy: HyperPodInference, and preserves backups at /tmp/hyperpod-migration-backup-<timestamp>/ for rollback.

Platform features

  • Multi-instance-type deployment via Kubernetes node-affinity rules with descending weights — structural answer to GPU-capacity scarcity. The canonical example from the launch post: instanceTypes: ["ml.p4d.24xlarge", "ml.g5.24xlarge", "ml.g5.8xlarge"] compiles to a required-set restriction + preferred-weighted ordering; the scheduler silently falls back to the next-priority type when the preferred is unavailable.
  • Managed tiered KV cache — optional at install, with intelligent memory allocation per instance type. AWS-claimed up to 40% inference-latency reduction for long- context workloads (un-methodologied). KV cache lifecycle is now a platform concern, not a model-server-library concern.
  • Intelligent routing — three strategies (prefix-aware / KV-aware / round- robin) picked at install time to maximise cross-request cache reuse. Specialisation of concepts/workload-aware-routing for LLM inference.
  • KEDA-via-IRSA autoscaling bundled as a default dependency with its own IAM role.
  • Observability integration with TTFT / latency / GPU-utilisation dashboards.

Three install paths

  1. SageMaker console (recommended) — Quick Install (all defaults) or Custom Install (specify existing IAM roles / buckets / add-ons).
  2. EKS CLIaws eks create-addon with a JSON configuration-values blob carrying executionRoleArn / tlsCertificateS3Bucket / hyperpodClusterArn / alb.serviceAccount / keda.auth.aws.irsa. Prerequisites (IAM roles, S3 bucket, VPC endpoints, dependency add-ons) must exist beforehand.
  3. Terraform — the aws-samples/awsome-distributed-training module with create_hyperpod_inference_operator_module = true. Also installs the training-operator, task-governance, and observability add-ons alongside.

Seen in

Last updated · 200 distilled / 1,178 read