SYSTEM Cited by 1 source
Zalando ML Portal (Backstage)¶
Definition¶
The Zalando ML Portal is the observability surface for systems/zflow-authored ML pipelines — a section of Zalando's internal developer portal that runs on top of systems/backstage. It overlays pipeline execution state, per-run metrics, and model cards on top of the underlying Step Functions state machines that zflow compiles to.
Canonical disclosure¶
From the 2022-04-18 ML Platform overview (sources/2022-04-18-zalando-zalandos-machine-learning-platform):
"Pipeline tracking is a part of the internal Zalando developer portal running on top of Backstage, an open-source platform for building such portals."
"This ML web interface provides a detailed, real-time view of pipeline execution. Pipeline authors can monitor how metrics evolve across multiple runs of training pipelines and can view these changes on a graph. They can also view model cards for models created by the pipelines. These are just a few features of the ML portal, and the tool is actively developed to improve the process of experimenting with notebooks and deploying the pipelines in production."
Named capabilities¶
- Real-time pipeline execution view — see running pipelines and their per-step state live.
- Per-run metric evolution graphs — how metrics (PR-AUC, ROC, custom business metrics, etc.) change across successive training runs of the same pipeline. The graph is the canonical cross-run-diff surface.
- Model cards — per-model card documenting the model's training run, evaluation metrics, and the pipeline that created it. (Specific schema not disclosed.)
- "Actively developed to improve the process of experimenting with notebooks and deploying the pipelines in production" — named as a living product, not a frozen artefact.
Why Backstage (and not the AWS Console)¶
The AWS Console for Step Functions + SageMaker shows state-machine executions and SageMaker training-job metrics, but it lacks cross-run metric plotting and it lacks ML-domain primitives like model cards. Backstage lets Zalando add those specialised views as plugins while keeping every other engineering tool in the same portal.
Wiki positioning¶
- Canonical instance of the developer- portal-as-ML-control-plane pattern on the wiki.
- The ML Portal is the second internal-tool surface that Zalando ML Platform operates (the first being systems/zflow itself). The post names "two teams actively develop zflow and monitoring tools for pipelines" — monitoring tools ≈ the ML portal.
- Internals (Backstage plugin structure, metric storage, model-card schema) are not disclosed. Stub page — expand when Zalando publishes more.
Seen in¶
- sources/2022-04-18-zalando-zalandos-machine-learning-platform — canonical disclosure; named capabilities and explicit Backstage-substrate call-out.
Related¶
- systems/backstage — the substrate.
- systems/zflow — the pipeline authoring tool whose outputs this portal observes.
- systems/aws-step-functions · systems/aws-sagemaker-ai — the managed services underneath whose executions the portal overlays.
- companies/zalando
- patterns/web-portal-for-ml-pipeline-observability