SYSTEM Cited by 3 sources
Amazon SageMaker AI¶
Amazon SageMaker AI is AWS's unified managed ML platform, launched in 2017 with the stated mission of "put[ting] machine learning in the hands of any developer, irrespective of their skill level." Product-line umbrella for the Studio IDE, spaces (managed dev environments), notebooks, managed training, model hosting, and — since 2024 — the systems/aws-sagemaker-hyperpod large-scale distributed-training / inference substrate.
Stub page: expand as sources cite specific subsystems.
Notable capabilities (from ingested sources)¶
StartSessionAPI (2025) — creates an SSH-over-SSM tunnel into a SageMaker Studio space, letting local IDEs (VS Code via the AWS Toolkit plug-in) attach to SageMaker AI compute without hand-rolled SSH workarounds. Authentication context carries over from Studio for one-click access; local IDE sessions use credentials via the AWS Toolkit. Auto-reconnects through network interruptions. Builds on the earlier SageMaker SSH Helper work. See patterns/secure-tunnel-to-managed-compute. (Source: sources/2025-08-06-allthingsdistributed-removing-friction-sagemaker-ai-development)- HyperPod observability, model deployment, and training operator — see systems/aws-sagemaker-hyperpod.
Seen in¶
- sources/2025-08-06-allthingsdistributed-removing-friction-sagemaker-ai-development — Werner Vogels' four-capability survey: StartSession API, HyperPod observability, HyperPod model deployment, HyperPod training operator.
- sources/2026-04-01-aws-automate-safety-monitoring-with-computer-vision-and-generative-ai — canonical wiki reference for the full ML-lifecycle stack on SageMaker: Ground Truth for labelling, AI Pipelines for training workflows, Endpoints (Serverless → Serverful ml.g6 pivot at scale) for inference, Batch Transform for GLIGEN synthetic-data generation. Trained YOLOv8 (PyTorch 2.1 + cosine LR + AdamW) reaches 99.5% mAP@50 for PPE detection + 94.3% mAP@50 for Housekeeping without any manually-annotated real images.
- sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod — launches the HyperPod Inference Operator as a native EKS add-on (replacing the prior Helm install path — see patterns/eks-add-on-as-lifecycle-packaging) with three new platform features: multi- instance-type GPU fallback via prioritised node affinity, managed tiered KV cache, and prefix-aware / KV-aware / round-robin inference routing. See systems/sagemaker-hyperpod-inference-operator.
Related¶
- systems/aws-sagemaker-hyperpod — the compute substrate
- systems/aws-systems-manager — SSM Session Manager underlies the StartSession tunnel
- systems/kubernetes — HyperPod training operator runs on K8s