Skip to content

SYSTEM Cited by 1 source

SageMaker Processing Job

SageMaker Processing Job is a managed compute primitive inside SageMaker AI for running containerised data-processing workloads (feature engineering, data transformation, post-processing, model evaluation) on AWS-managed instances without standing up dedicated infrastructure. A processing job pulls a Docker image, mounts input datasets from S3, runs the container, and writes results back to S3 — then tears down.

Stub page — expand as deeper internals are ingested.

Role on the wiki

Named in Zalando ZEOS's inventory-optimisation system as the vertical-scaling tier of a two-tier feature-engineering pipeline:

  • Horizontal tierPySpark on Databricks for SQL-expressible joins / filters / aggregations.
  • Vertical tier — SageMaker Processing Job for transformations requiring Pandas / scikit-learn / NumPy / Numba / SciPy — libraries that lack native distribution support and benefit from running on a single large instance.

This split is a canonical architectural vocabulary entry — concepts/data-preprocessing-vs-data-transformation-split.

Seen in

Last updated · 501 distilled / 1,218 read