SYSTEM Cited by 6 sources
Amazon SageMaker AI¶
Amazon SageMaker AI is AWS's unified managed ML platform, launched in 2017 with the stated mission of "put[ting] machine learning in the hands of any developer, irrespective of their skill level." Product-line umbrella for the Studio IDE, spaces (managed dev environments), notebooks, managed training, model hosting, and — since 2024 — the systems/aws-sagemaker-hyperpod large-scale distributed-training / inference substrate.
Stub page: expand as sources cite specific subsystems.
Notable capabilities (from ingested sources)¶
StartSessionAPI (2025) — creates an SSH-over-SSM tunnel into a SageMaker Studio space, letting local IDEs (VS Code via the AWS Toolkit plug-in) attach to SageMaker AI compute without hand-rolled SSH workarounds. Authentication context carries over from Studio for one-click access; local IDE sessions use credentials via the AWS Toolkit. Auto-reconnects through network interruptions. Builds on the earlier SageMaker SSH Helper work. See patterns/secure-tunnel-to-managed-compute. (Source: sources/2025-08-06-allthingsdistributed-removing-friction-sagemaker-ai-development)- HyperPod observability, model deployment, and training operator — see systems/aws-sagemaker-hyperpod.
Seen in¶
- sources/2025-08-06-allthingsdistributed-removing-friction-sagemaker-ai-development — Werner Vogels' four-capability survey: StartSession API, HyperPod observability, HyperPod model deployment, HyperPod training operator.
- sources/2026-04-01-aws-automate-safety-monitoring-with-computer-vision-and-generative-ai — canonical wiki reference for the full ML-lifecycle stack on SageMaker: Ground Truth for labelling, AI Pipelines for training workflows, Endpoints (Serverless → Serverful ml.g6 pivot at scale) for inference, Batch Transform for GLIGEN synthetic-data generation. Trained YOLOv8 (PyTorch 2.1 + cosine LR + AdamW) reaches 99.5% mAP@50 for PPE detection + 94.3% mAP@50 for Housekeeping without any manually-annotated real images.
- sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod — launches the HyperPod Inference Operator as a native EKS add-on (replacing the prior Helm install path — see patterns/eks-add-on-as-lifecycle-packaging) with three new platform features: multi- instance-type GPU fallback via prioritised node affinity, managed tiered KV cache, and prefix-aware / KV-aware / round-robin inference routing. See systems/sagemaker-hyperpod-inference-operator.
- — Zalando Payments 2020–2021 migration off Scala + Spark onto a managed-service ML pipeline orchestrated via systems/zflow (Zalando ML Platform's internal Python wrapper over Step Functions + Lambda + SageMaker + Databricks). Uses SageMaker training jobs, batch-transform jobs, and inference endpoints backed by SageMaker Inference Pipeline Model (scikit-learn preprocessing container + XGBoost / PyTorch / TF main-model container). First wiki instance of a European Tier-2 retrospective on the managed- services-over-custom-ML-platform migration pattern, with concrete load-test numbers (200–1000 RPS on m5 family) and a named up to 200% serving-cost increase accepted as migration tax.
- sources/2022-04-18-zalando-zalandos-machine-learning-platform — platform-altitude disclosure. Names SageMaker's role in Zalando's ML Platform stack as the substrate for training jobs, batch-transform jobs, and real-time endpoints invoked as steps in systems/zflow-authored pipelines. Architectural decision rationale disclosed verbatim: "Step Functions is a platform for building and executing workflows consisting of multiple steps that may call various other services, such as AWS Lambda, S3 and Amazon SageMaker." SageMaker is named as one of the canonical step-target services underneath zflow-generated Step Functions state machines across hundreds of pipelines org-wide. This is the 2022-era platform-overview sibling of the 2021-02-15 specific-workload retrospective.
- sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — third named Zalando workload on SageMaker (after axis 10
- axis 11): the ZEOS inventory-optimisation system. Exercises four SageMaker primitives in one story: Processing Jobs (feature transformation + post-processing drift monitoring), Training Jobs (train-and-infer in a single job for LightGBM-based forecasting), Batch Transform (daily cross-merchant optimisation over the full article catalog), and Feature Store in both online and offline modes (the first wiki disclosure of the dual-mode feature-store pattern — 10–20 ms online, S3-backed append offline).
- sources/2026-05-28-slack-slack-ai-the-path-to-multi-cloud — SageMaker as Phase 1 substrate for enterprise LLM serving with escrow VPC. In early 2023, Slack hosted Anthropic models on SageMaker inside a "sophisticated escrow virtual private cloud (VPC) strategy to establish a strict zero-knowledge environment: our data remained private to Slack, and the provider's proprietary model weights remained inaccessible to us." Multi-region deployment with cross-region IAM, balanced routing across model endpoints, proactive capacity planning, and auto-scaling logic. By early 2024, GPU scarcity (A100 / emerging H100) was mitigated via On-Demand Capacity Reservations + cron-based scaling. The phase exposed three structural taxes: scaling latency (initialisation prevented instantaneous scaling), hardware scarcity (enterprise GPUs "often unavailable"), over-provisioning (idle resources to meet peak SLAs). These taxes drove Slack's Phase 2 migration to fully managed Amazon Bedrock in mid-2024 — primarily because Bedrock had become AWS's primary launchpad for new Anthropic models, with model iterations debuting on Bedrock "weeks or months before SageMaker availability" (canonical model feature lag disclosure). Slack's verbatim Phase 1 takeaway: "To scale, we needed automated capacity, not manual coordination."
Related¶
- systems/aws-sagemaker-hyperpod — the compute substrate
- systems/aws-systems-manager — SSM Session Manager underlies the StartSession tunnel
- systems/kubernetes — HyperPod training operator runs on K8s