SYSTEM Cited by 1 source
SageMaker Training Job¶
SageMaker Training Job is a managed compute primitive inside SageMaker AI that runs a containerised ML training script on AWS-managed instances. Input data is mounted from S3, the container produces a model artifact (also written to S3), and the job is torn down. Canonically used for training, but can also perform inference within the same job when the model is lightweight enough that hosting a separate inference endpoint isn't worth the complexity — see patterns/single-sagemaker-training-job-train-and-infer.
Stub page — expand as deeper internals are ingested.
Seen in¶
- sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — canonical first wiki disclosure of the "train + infer in one job" collapse for LightGBM forecasting. Zalando ZEOS forecaster uses a single SageMaker Training Job to both train the LightGBM model and run inference on 5M SKUs × 12-week horizon because the model is lightweight enough to bypass checkpointing + separate inference infrastructure.
Related¶
- systems/aws-sagemaker-ai — parent product.
- systems/sagemaker-processing-job · systems/sagemaker-batch-transform-job — sibling primitives.
- systems/lightgbm — model trained on this compute tier in Zalando's forecaster.
- patterns/single-sagemaker-training-job-train-and-infer
- companies/zalando