SYSTEM Cited by 1 source
SageMaker Processing Job¶
SageMaker Processing Job is a managed compute primitive inside SageMaker AI for running containerised data-processing workloads (feature engineering, data transformation, post-processing, model evaluation) on AWS-managed instances without standing up dedicated infrastructure. A processing job pulls a Docker image, mounts input datasets from S3, runs the container, and writes results back to S3 — then tears down.
Stub page — expand as deeper internals are ingested.
Role on the wiki¶
Named in Zalando ZEOS's inventory-optimisation system as the vertical-scaling tier of a two-tier feature-engineering pipeline:
- Horizontal tier — PySpark on Databricks for SQL-expressible joins / filters / aggregations.
- Vertical tier — SageMaker Processing Job for transformations requiring Pandas / scikit-learn / NumPy / Numba / SciPy — libraries that lack native distribution support and benefit from running on a single large instance.
This split is a canonical architectural vocabulary entry — concepts/data-preprocessing-vs-data-transformation-split.
Seen in¶
- sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — canonical first disclosure as the data transformation tier of Zalando's inventory-optimisation feature-engineering pipeline, and as the post-processing tier of both the demand forecaster and the inventory optimiser (statistical model-performance analysis + business-metric computation feeding CloudWatch-alarm drift monitoring).
Related¶
- systems/aws-sagemaker-ai — parent SageMaker product.
- systems/sagemaker-training-job · systems/sagemaker-batch-transform-job — sibling SageMaker compute primitives.
- concepts/data-preprocessing-vs-data-transformation-split
- companies/zalando