SYSTEM Cited by 1 source

SageMaker Processing Job¶

SageMaker Processing Job is a managed compute primitive inside SageMaker AI for running containerised data-processing workloads (feature engineering, data transformation, post-processing, model evaluation) on AWS-managed instances without standing up dedicated infrastructure. A processing job pulls a Docker image, mounts input datasets from S3, runs the container, and writes results back to S3 — then tears down.

Stub page — expand as deeper internals are ingested.

Role on the wiki¶

Named in Zalando ZEOS's inventory-optimisation system as the vertical-scaling tier of a two-tier feature-engineering pipeline:

Horizontal tier — PySpark on Databricks for SQL-expressible joins / filters / aggregations.
Vertical tier — SageMaker Processing Job for transformations requiring Pandas / scikit-learn / NumPy / Numba / SciPy — libraries that lack native distribution support and benefit from running on a single large instance.

This split is a canonical architectural vocabulary entry — concepts/data-preprocessing-vs-data-transformation-split.

Seen in¶

sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — canonical first disclosure as the data transformation tier of Zalando's inventory-optimisation feature-engineering pipeline, and as the post-processing tier of both the demand forecaster and the inventory optimiser (statistical model-performance analysis + business-metric computation feeding CloudWatch-alarm drift monitoring).

systems/aws-sagemaker-ai — parent SageMaker product.
systems/sagemaker-training-job · systems/sagemaker-batch-transform-job — sibling SageMaker compute primitives.
concepts/data-preprocessing-vs-data-transformation-split
companies/zalando

SageMaker Processing Job¶

Role on the wiki¶

Seen in¶

Related¶