SYSTEM Cited by 1 source
zflow¶
zflow is Zalando's internal Python workflow library for machine-learning pipelines. Built by the Zalando Machine Learning Platform team, it is a thin orchestration layer on top of systems/aws-step-functions, systems/aws-lambda, Amazon SageMaker, and systems/databricks Spark. Data scientists and engineers declare ML workflows in Python and zflow translates them into Step Functions state machines that invoke SageMaker training jobs, SageMaker batch transforms, SageMaker endpoints, Databricks jobs, and Lambdas as workflow steps.
Its publicly named role on the wiki is as the authoring substrate for the 2020–2021 Zalando Payments risk-scoring pipeline migration away from a legacy Scala + Spark monolith.
Role in the ML platform¶
- Workflow authoring — Python-native DSL; declares steps and dependencies; compiles down to an Step Functions state machine.
- Step heterogeneity — one zflow workflow can mix SageMaker training jobs, SageMaker batch-transform jobs, SageMaker endpoint deployments, and Databricks Spark jobs in a single orchestration.
- Scheduling — users "easily orchestrate and schedule ML workflows" per the Zalando post.
- Abstraction goal — "we steer away from implementing the whole system from scratch"; consumers use zflow instead of wiring Step Functions, SageMaker SDK, and Databricks APIs themselves.
Canonical disclosure¶
From Zalando Payments' 2021 retrospective (sources/2021-02-15-zalando-a-machine-learning-pipeline-with-real-time-inference):
"At Zalando, we use a tool provided by Zalando's ML Platform team called zflow. It is essentially a Python library built on top of AWS Step Functions, AWS Lambdas, Amazon SageMaker, and Databricks Spark, that allows users to easily orchestrate and schedule ML workflows."
The concrete zflow-orchestrated workflow disclosed in that post:
- Training data preprocessing — Databricks cluster + scikit-learn batch-transform job on SageMaker.
- Training — SageMaker training job.
- Batch predictions — SageMaker batch-transform job.
- Performance report — Databricks job producing a PDF.
- Endpoint deployment — SageMaker real-time endpoint backed by an inference pipeline model (scikit-learn preprocessing container + main-model container).
Wiki positioning¶
- Open stub because the post discloses that zflow exists and what it wraps, but not its internals (IR, caching semantics, retry policies, model registry).
- zflow is the Zalando-specific instance of the broader pattern patterns/managed-services-over-custom-ml-platform — instead of each team hand-wiring Step Functions + SageMaker, the ML Platform team productises the glue as a Python library.
- Positions Zalando's ML Platform as an internal consulting organisation (parallel to many large-enterprise patterns); collaborations run via Statements of Work (see the Payments team's 9-month engagement).
Seen in¶
- sources/2021-02-15-zalando-a-machine-learning-pipeline-with-real-time-inference — canonical public disclosure. Authored by Zalando Payments + ML Platform teams. zflow orchestrates the five-stage workflow replacing the legacy Scala + Spark fraud-detection monolith. External reference: an ML Platform team member's LinkedIn post "Building ML workflows at Zalando: zflow" is cited by the engineering post but is not a Zalando-blog artefact.